6 releases
0.4.2 | Dec 3, 2023 |
---|---|
0.4.1 | Dec 2, 2023 |
0.3.2 | Nov 28, 2023 |
0.2.1 |
|
0.1.2 |
|
#328 in Internationalization (i18n)
110KB
2K
SLoC
encoding_rs_rw
Space-efficient std::io::{Read, Write} wrappers for encoding_rs
This crate provides std::io::Read
and std::io::Write
implementations for
encoding_rs::Decoder
and encoding_rs::Encoder
, respectively, to support
Rust's standard streaming API.
use std::{fs, io, io::prelude::*};
use encoding_rs::{EUC_JP, SHIFT_JIS};
use encoding_rs_rw::{DecodingReader, EncodingWriter};
let file_r = io::BufReader::new(fs::File::open("foo.txt")?);
let mut reader = DecodingReader::new(file_r, EUC_JP.new_decoder());
let mut utf8 = String::new();
reader.read_to_string(&mut utf8)?;
let file_w = fs::File::create("bar.txt")?;
let mut writer = EncodingWriter::new(file_w, SHIFT_JIS.new_encoder());
write!(writer, "{}", utf8)?;
writer.flush()?;
This crate is an alternative to encoding_rs_io
but provides a simpler API
and more flexible error semantics.
This crate also provides a lossy
variant of the decoding reader that replaces
malformed byte sequences with replacement characters (U+FFED) and a
with_unmappable_handler
variant of writer that handles unmappable characters
with the specified handler.
use std::{fs, io, io::prelude::*};
use encoding_rs::{EUC_KR, ISO_8859_7};
use encoding_rs_rw::{DecodingReader, EncodingWriter};
let file_r = io::BufReader::new(fs::File::open("baz.txt")?);
let mut reader = DecodingReader::new(file_r, EUC_KR.new_decoder());
let mut utf8 = String::new();
reader.lossy().read_to_string(&mut utf8)?;
let file_w = fs::File::create("qux.txt")?;
let mut writer = EncodingWriter::new(file_w, ISO_8859_7.new_encoder());
{
let mut writer =
writer.with_unmappable_handler(|e, w| write!(w, "&#{};", u32::from(e.value())));
write!(writer, "{}", utf8)?;
writer.flush()?;
}
Design
Conversion between different character encodings essentially requires byte
buffers before and after the converter to implement Rust's Read
and Write
traits because, whereas read
and write
must support byte-by-byte operations,
character encoders and decoders consume and produce multiple bytes at a time to
handle multi-byte characters. The types in this crate employ small buffers to
operate byte-by-byte, but it bypasses the internal buffers and utilizes the
supplied buffers as much as possible to minimize double-buffering and memory
consumption.
License
Licensed under the Apache License, Version 2.0.
See also
Dependencies
~3.5MB
~119K SLoC