10 releases (4 stable)
1.1.0 | Jul 26, 2023 |
---|---|
1.0.2 | Apr 22, 2023 |
1.0.1 | Nov 2, 2022 |
0.3.3 | Jan 18, 2022 |
0.1.0 | Jul 2, 2021 |
#434 in Encoding
2,932 downloads per month
Used in 5 crates
1MB
32K
SLoC
Yore
A Rust library for decoding and encoding character sets based on OEM code pages.
Features
- Fast performance *
- Minimal memory usage with
Cow
andshrink_to_fit
- Easy-to-use API
- Broad range of supported code pages
- Handles code pages with redefined ASCII characters (<0x80), such as '٪' in CP864
Usage
Add yore
to your Cargo.toml
file.
[dependencies]
yore = "1.1.0"
Examples
Using a specific code page
use yore::code_pages::{CP857, CP850};
use yore::{DecodeError, EncodeError};
// Vec contains ASCII "text"
let bytes = vec![116, 101, 120, 116];
// Vec contains ASCII "text " and codepoint 231
let bytes_undefined = vec![116, 101, 120, 116, 32, 231];
// Notice that decoding CP850 can't fail because it is completely defined
assert_eq!(CP850.decode(&bytes), "text");
// However, CP857 can fail
assert_eq!(CP857.decode(&bytes).unwrap(), "text");
// "text " + codepoint 231
assert!(matches!(CP857.decode(&bytes_undefined), DecodeError));
// Lossy decoding won't fail due to fallback
assert_eq!(CP857.decode_lossy(&bytes_undefined), "text �");
// Encoding
assert_eq!(CP850.encode("text").unwrap(), bytes);
assert!(matches!(CP850.encode("text 🦀"), EncodeError));
assert_eq!(CP850.encode_lossy("text 🦀", 231), bytes_undefined);
Using a trait object
use yore::CodePage;
fn do_something(code_page: &dyn CodePage, bytes: &[u8]) {
println!("{}", code_page.decode(bytes).unwrap());
}
Supported code pages
Identifier | Name | Description |
---|---|---|
437 | ibm437 | OEM United States |
737 | ibm737 | OEM Greek (formerly 437G); Greek (DOS) |
775 | ibm775 | OEM Baltic; Baltic (DOS) |
850 | ibm850 | OEM Multilingual Latin 1; Western European (DOS) |
852 | ibm852 | OEM Latin 2; Central European (DOS) |
855 | ibm855 | OEM Cyrillic (primarily Russian) |
857 | ibm857 | OEM Turkish; Turkish (DOS) |
860 | ibm860 | OEM Portuguese; Portuguese (DOS) |
861 | ibm861 | OEM Icelandic; Icelandic (DOS) |
862 | dos-862 | OEM Hebrew; Hebrew (DOS) |
863 | ibm863 | OEM French Canadian; French Canadian (DOS) |
864 | ibm864 | OEM Arabic; Arabic (864) |
865 | ibm865 | OEM Nordic; Nordic (DOS) |
866 | cp866 | OEM Russian; Cyrillic (DOS) |
869 | ibm869 | OEM Modern Greek; Greek, Modern (DOS) |
874 | windows-874 | Thai (Windows) |
910 | ibm910 | IBM-PC APL2 |
1250 | windows-1250 | ANSI Central European; Central European (Windows) |
1251 | windows-1251 | ANSI Cyrillic; Cyrillic (Windows) |
1252 | windows-1252 | ANSI Latin 1; Western European (Windows) |
1253 | windows-1253 | ANSI Greek; Greek (Windows) |
1254 | windows-1254 | ANSI Turkish; Turkish (Windows) |
1255 | windows-1255 | ANSI Hebrew; Hebrew (Windows) |
1256 | windows-1256 | ANSI Arabic; Arabic (Windows) |
1257 | windows-1257 | ANSI Baltic; Baltic (Windows) |
1258 | windows-1258 | ANSI/OEM Vietnamese; Vietnamese (Windows) |
* Benchmarks
encoding_rs
supports only a few of the encodings that oem_cp
and yore
support. Additionally, encoding_rs
focuses on streaming use cases.
Refer to the bench crate for more details.
Dependencies
~250–700KB
~16K SLoC