#iterator #nlp #no-std #text

no-std charmap

A library for one-to-(none/one/many) character mapping

5 releases

0.2.2 Apr 9, 2023
0.2.1 Apr 9, 2023
0.2.0 Apr 9, 2023
0.1.1 Apr 8, 2023
0.1.0 Apr 8, 2023

#804 in Text processing

MIT/Apache

17KB
202 lines

charmap

Build Status crates.io Documentation License

A Rust library for one-to-(none/one/many) character mapping. It's main use-case is preprocessing, transliterating, and cleaning natural language text.

Usage

To use charmap with libstd's mapping types (HashMap and BTreeMap), add the following to your Cargo.toml:

[dependencies]
charmap = "0.2"

This should also allow you to use rustc-hash's FxHashMap since it is an instance of libstd's HashMap.

charmap also supports hashbrown's HashMap and phf's Map and OrderedMap types. You can enable these by setting the "hashbrown" and "phf" features respectively. For example, to use charmap with phf, add the following to your Cargo.toml:

[dependencies]
charmap = {version = "0.2", features = ["phf"]}

You can also disable libstd support for no_std builds by setting default-features = false. For example:

[dependencies]
charmap = {version = "0.2", default-features = false, features = ["phf"]}

Example

Below is an example of how to use charmap with libstd's HashMap:

use std::collections::HashMap;
use charmap::*;

// We first define our action map that will tell our CharMapper what to do
// when it sees a particular character.
let actions = HashMap::from([
    ('!', CharMapAction::Delete),  // Delete instances of '!'
    ('l', CharMapAction::SubStr("LLL")),  // Substitute instances of 'l' with 'LLL'
]);

// This is the string we want to charmap.
let start_str = "Hello, world!";

// Create a character mapper using the previously defined actions while
// allowing all other character to be output as they are.
let mapper = CharMapper::new(&actions, CharMapAction::Pass);

// Use mapper to charmap start_str
let mapped_str: String = start_str.map_chars(&mapper).collect();

// Output should be: HeLLLLLLo, worLLLd
println!("{}", mapped_str);

License

CharMap is distributed under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-MIT and LICENSE-APACHE for details.

Dependencies

~0–440KB