#child-process #utf-8 #locale #subprocess #env-var

bin+lib utf8-locale

Detect a UTF-8-capable locale for running child processes in

5 releases (3 stable)

1.0.3 Feb 27, 2024
1.0.1 Jun 29, 2023
1.0.0 Oct 22, 2022
0.3.0 Feb 23, 2022
0.2.0 Feb 1, 2022

#548 in Command line utilities

BSD-2-Clause

70KB
1.5K SLoC

Rust 498 SLoC // 0.0% comments Python 484 SLoC // 0.2% comments C 473 SLoC // 0.0% comments INI 97 SLoC // 0.0% comments Shell 84 SLoC // 0.2% comments

Detect a UTF-8-capable locale for running child processes in

[Home | Download | GitLab | PyPI | crates.io | ReadTheDocs]

Overview

Sometimes it is useful for a program to be able to run a child process and more or less depend on its output being valid UTF-8. This can usually be accomplished by setting one or more environment variables, but there is the question of what to set them to - what UTF-8-capable locale is present on this particular system? This is where the utf8_locale module comes in.

Examples

For the Rust implementation:

use std::process;

use utf8_locale;

let utf8env = utf8_locale::Utf8Detect()::new().detect()?;
let cmd = process::Command::new(...).env_clear().envs(utf8_env.env);

For the Python implementation:

import subprocess

import utf8_locale

utf8env = utf8_locale.Utf8Detect().detect()
subprocess.check_output([...], encoding="UTF-8", env=utf8env.env)

Classes (Python and Rust)

LanguagesDetect

The detect() method of this class examines either the provided environment variables or the current process's environment and returns a list of language codes in order of preference that may then be used for determining which UTF-8-capable locale to use.

Utf8Detect

The detect() method of this class runs the external locale command to obtain a list of the supported locale names, and then picks a suitable one to use so that programs are more likely to output valid UTF-8 characters and language-neutral messages. It prefers the C base locale, but if neither C.UTF-8 nor C.utf8 is available, it will fall back to a list of other locale names that are likely to be present on the system. The list of preferred language codes is configurable.

Functions

Note that for the Python and Rust implementation it is recommended to use the Utf8Detect and, if needed, the LanguagesDetect builder classes to perform the detection.

detect_utf8_locale()

The detect_utf8_locale() function runs the external locale command to obtain a list of the supported locale names, and then picks a suitable one to use so that programs are more likely to output valid UTF-8 characters and language-neutral messages. It prefers the C base locale, but if neither C.UTF-8 nor C.utf8 is available, it will fall back to a list of other locale names that are likely to be present on the system.

get_utf8_vars()

The get_utf8_vars() function invokes detect_utf8_locale() and then returns a dictionary/hashmap containing two entries: LC_ALL set to the obtained locale name and LANGUAGE set to an empty string so that recent versions of the gettext library do not choose a different language to output messages in.

get_utf8_env()

The get_utf8_env() function invokes detect_utf8_locale() and then returns a dictionary/hashmap containing the current environment variables, LC_ALL set to the obtained locale name, and LANGUAGE set to an empty string so that recent versions of the gettext library do not choose a different language to output messages in.

get_preferred_languages()

The get_preferred_languages() function examines either the current process environment or the provided dictionary and returns a list of the languages specified in the locale variables (LC_ALL, LANG, LC_MESSAGES, etc). It may be used by programs to add the user's currently preferred locale to their own settings.

Contact

The feature-check library was written by Peter Pentchev. It is developed in a GitLab repository. This documentation is hosted at Ringlet with a copy at ReadTheDocs.

Dependencies

~7.5MB
~126K SLoC