-
bstr
A string type that is not required to be valid UTF-8
-
pulldown-cmark
A pull parser for CommonMark
-
regex
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
-
globset
Cross platform single glob and glob set matching. Glob set matching is the process of matching one or more glob patterns against a single candidate path simultaneously, and returning all of the globs that matched.
-
const_format
Compile-time string formatting
-
encoding_rs
A Gecko-oriented implementation of the Encoding Standard
-
fancy-regex
An implementation of regexes, supporting a relatively rich set of features, including backreferences and look-around
-
mdbook
Creates a book from markdown files
-
tabled
An easy to use library for pretty print tables of Rust
struct
s andenum
s -
lazy-regex
lazy static regular expressions checked at compile time
-
unicode-xid
Determine whether characters have the XID_Start or XID_Continue properties according to Unicode Standard Annex #31
-
textwrap
Powerful library for word wrapping, indenting, and dedenting strings
-
heck
heck is a case conversion library
-
unicode-segmentation
This crate provides Grapheme Cluster, Word and Sentence boundaries according to Unicode Standard Annex #29 rules
-
similar
A diff library for Rust
-
comrak
A 100% CommonMark-compatible GitHub Flavored Markdown parser and formatter
-
pretty
Wadler-style pretty-printing combinators in Rust
-
aho-corasick
Fast multiple substring searching
-
csml_engine
The CSML Engine is a conversational engine designed to make it extremely easy to create rich and powerful chatbots
-
ropey
A fast and robust text rope for Rust
-
linkify
Finds URLs and email addresses in plain text. Takes care to get the boundaries right with surrounding punctuation like parentheses.
-
comfy-table
An easy to use library for building beautiful tables with automatic content wrapping
-
rectangle-pack
A general purpose, deterministic bin packer designed to conform to any two or three dimensional use case
-
widestring
A wide string Rust library for converting to and from wide strings, such as those often used in Windows API or other FFI libaries. Both
u16
andu32
string types are provided, including support for UTF-16 and UTF-32… -
ab_glyph_rasterizer
Coverage rasterization for lines, quadratic & cubic beziers
-
regex-automata
Automata construction and matching using regular expressions
-
hyphenation
Knuth-Liang hyphenation for a variety of languages
-
deunicode
Convert Unicode strings to pure ASCII by intelligently transliterating them. Suppors Emoji and Chinese.
-
lindera-cli
A morphological analysis tool
-
os_display
Display strings in a safe platform-appropriate way
-
rustybuzz
A complete harfbuzz shaping algorithm port to Rust
-
Inflector
Adds String based inflections for Rust. Snake, kebab, camel, sentence, class, title and table cases as well as ordinalize, deordinalize, demodulize, foreign key, and pluralize/singularize…
-
prettydiff
Side-by-side diff for two files
-
pulldown-cmark-to-cmark
Convert pulldown-cmark Events back to the string they were parsed from
-
printpdf
Rust library for writing PDF files
-
difference
A Rust text diffing and assertion library
-
termimad
Markdown Renderer for the Terminal
-
harfbuzz_rs
A high-level interface to HarfBuzz, exposing its most important functionality in a safe manner using Rust
-
unicode-normalization
This crate provides functions for normalization of Unicode strings, including Canonical and Compatible Decomposition and Recomposition, as described in Unicode Standard Annex #15
-
uwc
Counts things in unicode text files
-
unicode-id
Determine whether characters have the ID_Start or ID_Continue properties according to Unicode Standard Annex #31
-
lindera
A morphological analysis library
-
convert_case
Convert strings into any case
-
lingua
An accurate natural language detection library, suitable for long and short text alike
-
encoding
Character encoding support for Rust
-
substring
A substring method for string types
-
tokenizers
Provides an implementation of today’s most used tokenizers, with a focus on performances and versatility
-
diff
An LCS based slice and string diffing implementation
-
roff
ROFF (man page format) generation library
-
sscanf
A sscanf (inverse of format!()) Macro based on Regex
-
array_tool
Helper methods for processing collections
-
rant
The Rant procedural templating language
-
unicode_names2
Map characters to and from their name given in the Unicode standard. This goes to great lengths to be as efficient as possible in both time and space, with the full bidirectional tables weighing barely 500 KB…
-
indenter
A formatter wrapper that indents the text, designed for error display impls
-
hck
A sharp cut(1) clone
-
unindent
Remove a column of leading whitespace from a string
-
somedoc
A very simple document model and markup generator
-
unicode-general-category
Fast lookup of the Unicode General Category property for char
-
bytelines
Read input lines as byte slices for high efficiency
-
bcdown
Bilibili漫画下载器,written in Rust,支持epub pdf zip格式
-
fuzzy-matcher
Fuzzy Matching Library
-
svgbob_cli
Transform your ascii diagrams into happy little SVG
-
unicode-script
This crate exposes the Unicode
Script
andScript_Extension
properties from UAX #24 -
aki-resort
sort lines of text. You can use regex to specify the KEY.
-
mdbook-admonish
A preprocessor for mdbook to add Material Design admonishments
-
synoptic
A simple, low-level, syntax highlighting library with unicode support
-
titlecase
A tool and library that capitalizes text according to a style defined by John Gruber for post titles on his website Daring Fireball
-
ucd-generate
A program for generating packed representations of the Unicode character database that can be efficiently searched
-
shell2batch
Coverts simple basic shell scripts to windows batch scripts
-
text_io
really simple to use panicking input functions
-
instant-segment
Fast English word segmentation
-
uncased
Case-preserving, ASCII case-insensitive, no_std string types
-
grok
A rust implementation of the popular java & ruby grok library which allows easy text and log file processing with composable patterns
-
typos-dict
Source Code Spelling Correction
-
wana_kana
Utility library for checking and converting between Japanese characters - Kanji, Hiragana, Katakana - and Romaji
-
hgrep
hgrep is a grep tool with human-friendly search output. This is similar to
-C
option ofgrep
command, but its output is enhanced with syntax highlighting focusing on human readable outputs. -
easy_reader
A Rust library for easily navigating forward, backward or randomly through the lines of huge files
-
ab_glyph
API for loading, scaling, positioning and rasterizing OpenType font glyphs
-
onig_sys
The
onig_sys
crate contains raw rust bindings to the oniguruma library. This crate exposes a set of unsafe functions which can then be used by other crates to create safe wrappers around Oniguruma… -
mdbook-plantuml
A preprocessor for mdbook which will convert plantuml code blocks into inline SVG diagrams
-
fontconfig
Safe, higher-level wrapper around the Fontconfig library
-
emojis
✨ Lookup and iterate over emoji names, shortcodes, and groups
-
lindera-ipadic-builder
A Japanese morphological dictionary builder for IPADIC
-
fontdb
A simple, in-memory font database with CSS-like queries
-
runiq
An efficient way to filter duplicate lines from input, à la uniq
-
stop-words
Common stop words in many languages
-
vaporetto
Vaporetto: a pointwise prediction based tokenizer
-
ascii-hangman
customizable Hangman game with ASCII-art rewarding for children (desktop version)
-
stfu8
Sorta Text Format in UTF-8
-
ncount
A word count tool intended to derive useful stats from markdown
-
unidecode
Provides pure ASCII transliterations of Unicode strings
-
prefix
A customizable pretty printer for FIX messages
-
jieba-rs
The Jieba Chinese Word Segmentation Implemented in Rust
-
nlpo3
Thai natural language processing library, with Python and Node bindings
-
any_ascii
Unicode to ASCII transliteration
-
epub-builder
A Rust library for generating EPUB files
-
unic-ucd-ident
UNIC — Unicode Character Database — Identifier Properties
-
lopdf
A Rust library for PDF document manipulation
-
ngrammatic
Character-oriented ngram generator and fuzzy matching library
-
rs3a
Lib for reading and writing 3a format
-
sd
An intuitive find & replace CLI
-
mdbook-cat-prep
a preprocessor for mdbook which provides teacher, subject, material and tag functionality
-
analiticcl
Analiticcl is an approximate string matching or fuzzy-matching system that can be used to find variants for spelling correction or text normalisation
-
getzola/zola
A fast static site generator with everything built-in
-
diffy
Tools for finding and manipulating differences between files
-
character_converter
Turn Traditional Chinese script ot Simplified Chinese script and vice-versa and tokenize
-
r4d
Text oriented macro processor
-
unicase
A case-insensitive wrapper around strings
-
sanitizer
A collection of methods and macros to sanitize struct fields
-
gh-emoji
Convert
:emoji:
to Unicode using GitHub’s emoji names -
levenshtein_automata
Creates Levenshtein Automata in an efficient manner
-
human_regex
A regex library for humans
-
jetscii
A tiny library to efficiently search strings and byte slices for sets of ASCII characters or bytes
-
skeletal_animation
Skeletal character animation library, using gfx-rs
-
mecab
Safe Rust wrapper for mecab a japanese language part-of-speech and morphological analyzer library
-
mdbook-pdf
A backend for mdBook written in Rust for generating PDF based on headless chrome and Chrome DevTools Protocol
-
chardetng
A character encoding detector for legacy Web content
-
noto-sans-mono-bitmap
Contains the “Noto Sans Mono” font as pre-rasterized bitmap font in different sizes and font weights. This crate is
no_std
and needs no allocations or floating point operations. Useful… -
unicode_categories
Query Unicode category membership for chars
-
stringmetrics
Rust library for approximate string matching and spellchecking
-
line-span
Find line ranges and jump between next and previous lines
-
html2text
Render HTML as plain text
-
chars_data
Build-dependency for chars, the unicode character information CLI
-
nlprule
A fast, low-resource Natural Language Processing and Error Correction library
-
enum-ts
TypeScript Enum pattern matcher codegen
-
text_analysis
Analyze text stored as *.txt in provided file or directory. Doesn’t read files in subdirectories. Counting all words and then searching for every unique word in the vicinity (+-5 words)…
-
pad
Library for padding strings at runtime
-
hyperscan
Hyperscan bindings for Rust with Multiple Pattern and Streaming Scan
-
chamkho
Khmer, Lao, Myanmar, and Thai word segmentation/breaking library and command line
-
lowcharts
Tool to draw low-resolution graphs in terminal
-
clarifai_grpc
The official Clarifai gRPC Rust client
-
kathoey
Rust library for text feminization using open corpus linguistics data
-
cglue-bindgen
cleanup cbindgen headers for CGlue
-
swash
Font introspection, complex text shaping and glyph rendering
-
mdbook-fs-summary
Summary generator for mdbook
-
safe-regex-compiler
Regex compiler for the safe-regex crate
-
varcon-core
Varcon-relevant data structures
-
scanlex
a simple lexical scanner for parsing text into tokens
-
block-list
A minimalist hosts-based tool for managing block lists and ad-blocking
-
str_indices
Count and convert between indexing schemes on string slices
-
in_definite
Get the indefinite article (‘a’ or ‘an’) to match the given word. For example: an umbrella, a user.
-
stringmatch
Allow the use of regular expressions or strings wherever you need string comparison
-
proc-macro-regex
A proc macro regex library
-
unicode-width
Determine displayed width of
char
andstr
types according to Unicode Standard Annex #11 rules -
basic-text
Basic Text strings and I/O streams
-
allsorts
Font parser, shaping engine, and subsetter for OpenType, WOFF, and WOFF2
-
secular
No Diacr!
-
svgbobdoc
Renders ASCII diagrams in doc comments as SVG images
-
kv-log-macro
Log macro for log’s kv-unstable backend
-
probly-search
A lightweight full-text search engine with a fully customizable scoring function
-
utf-8
Incremental, zero-copy UTF-8 decoding with error handling
-
const_format_proc_macros
Implementation detail of the
const_format
crate -
mdbook-dtmo
Creates a book from markdown files with added plugins
-
libretranslate
Use The Libretranslate Open Source Machine Translation
-
ufofmt
A fast, flexible UFO source file formatter based on the Norad library
-
textcode
Text encoding/decoding library. Supports: UTF-8, ISO6937, ISO8859, GB2312
-
const-str
compile-time string operations
-
rcut-lib
rcut is a Rust replacement for GNU cut that supports UTF-8
-
wikidot-normalize
Simple library to provide Wikidot-compatible string normalization
-
tracery
Text-expansion library
-
egg-mode-text
Text parsing for Twitter: character counting, hashtag/mention extraction
-
tengwar
Transliterate latin text into J.R.R. Tolkien’s Tengwar.
-
unicode-security
Detect possible security problems with Unicode usage according to Unicode Technical Standard #39 rules
-
awabi
A morphological analyzer using mecab dictionary
-
molybdenum
Recursive search and replace CLI application
-
mlc
The markup link checker (mlc) checks for broken links in markup files
-
aki-mcolor
mark up text with color
-
galm
GalM is pattern matching library
-
str-utils
This crate provides some traits to extend types which implement
AsRef<[u8]>
orAsRef<str>
-
svgbob
Transform your ascii diagrams into happy little SVG
-
mdbook-template
A mdbook preprocessor that allows the re-usability of template files with dynamic arguments
-
chord3
Create pdf songbooks from chopro source
-
easy-regex
Make long regular expressions like pseudocodes
-
adbook
Creates a book from AsciiDoc files
-
encoding-next
Character encoding support for Rust
-
natural
Pure rust library for natural language processing
-
pdf-canvas
Generate PDF files in pure Rust. Currently, simple vector graphics and text set in the 14 built-in fonts are supported
-
unicode-blocks
This crate contains a list of all unicode blocks and provides some functions to search across them
-
pandoc_types
Rust port of pandoc-types
-
lipsum
Lipsum is a lorem ipsum text generation library. It generates pseudo-random Latin text. Use this if you need filler or dummy text for your application. The text is generated using a simple Markov chain…
-
cloc
Count, or compute differences of, lines of source code and comments
-
fasttext
fastText Rust binding
-
mdbook-epub
An EPUB renderer for mdbook
-
ascii_converter
A library for converting between different ascii representations
-
gspell
Rust bindings for gspell
-
file-expert
Expert system for recognizing source code files, similar to GitHub/lingust
-
symbolic_expressions
A symbolic-expression parser/writer
-
guarding
Guarding is a guardians for code, architecture, layered. Guarding crate a architecture aguard DSL which based on ArchUnit.
-
tantivy-analysis-contrib
A set of analysis components for Tantivy
-
lindera-tantivy
Lindera Tokenizer for Tantivy
-
voikko-rs
Rust bindings for the Voikko library
-
etch
Not just a text formatter, don’t mark it down, etch it
-
loc
Count lines of code (cloc) fast
-
oxford_join
Join string slices with Oxford Commas!
-
crowbook-text-processing
Provides some utilities functions for escaping text (HTML/LaTeX) and formatting it according to typographic rules (smart quotes, ellipsis, french typograhic rules)
-
chanoma
chanoma is Characters Normalization library. 文字列正規化処理用のライブラリです。
-
yozuk
Chatbot for Programmers
-
html-auto-p
This library provides a function like
wpautop
in Wordpress. It uses a group of regex replaces used to identify text formatted with newlines and replace double line-breaks with HTML paragraph tags. -
csml_interpreter
The CSML Interpreter is the official interpreter for the CSML programming language, a DSL designed to make it extremely easy to create rich and powerful chatbots
-
csvsc
Build processing chains for CSV files
-
vndb_tags_get
A tool to convert VNDB tags list from JSON into markdown. The list can be downloaded from https://dl.vndb.org/dump/vndb-tags-latest.json.gz in gzip. This tool read from stdin to make it simple…
-
yeslogic-ucd-generate
A program for generating packed representations of the Unicode character database that can be efficiently searched with support for additional tables
-
rust-tfidf
Library to calculate TF-IDF (Term Frequency - Inverse Document Frequency) for generic documents
-
res-regex
A js-regex validator
-
emojic
Emoji constants
-
lyra2
Pure rust library in Lyra2, Lyra2RE, Lyra2REv2, Lyra2REv3
-
unicode_reader
Adaptors which wrap byte-oriented readers and yield the UTF-8 data as Unicode code points or grapheme clusters
-
unicode-normalization-alignments
This crate provides functions for normalization of Unicode strings, including Canonical and Compatible Decomposition and Recomposition, as described in Unicode Standard Annex #15
-
mdbook-checklist
An mdBook preprocessor for generating checklists and indexes
-
precis-profiles
Implementation of the PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings Representing Usernames and Passwords as defined in rfc8265; and Nicknames as defined in rfc8266
-
modeling
Modeling is a tools to analysis different languages by Ctags
-
codepage-strings
encode / decode strings for Windows code pages
-
unic-ucd-age
UNIC — Unicode Character Database — Age
-
kas-text
Text layout and font management
-
bard
Creates PDF and HTML songbooks out of easy-to-write Markdown sources
-
ftrace
ftrace - trace files and paths
-
pdf-extract
A library to extract content from pdfs
-
textwrap-macros
Simple procedural macros to use textwrap utilities at compile time
-
kl-hyphenate
Knuth-Liang hyphenation for a variety of languages
-
termdiff
Write a diff with color codes to a string
-
float-pretty-print
Format f64 for showing to user, not for serialisation
-
pithy
Ultra-fast, spookily accurate text summarizer that works on any language
-
wchar
Procedural macros for compile time UTF-16 and UTF-32 wide strings
-
unicode-truncate
Unicode-aware algorithm to pad or truncate
str
in terms of displayed width -
fm
Non-backtracking fuzzy text matcher
-
csv-groupby
execute a sql-like group-by on arbitrary text or csv files
-
case
A set of letter case string helpers
-
mdxbook
Fork of mdBook, with more customizations and flexibility for programmers
-
cindex
CSV indexing library
-
udp-logger-rs
Log macro for log’s kv-unstable backend and a UDP socket logger
-
poriborton
Interconversion between Unicode and various Bengali ANSI encodings
-
str_inflector
Adds String based inflections for Rust. Snake, kebab, camel, sentence, class, title and table cases as well as ordinalize, deordinalize, demodulize, foreign key, and pluralize/singularize…
-
textspan
Text span utility
-
rsnltk
Rust-based Natural Language Toolkit
-
markov_strings
A simplistic Markov chain text generator
-
rosie-sys
A crate to build or link to librosie to access the Rosie Pattern Language
-
slug
Convert a unicode string to a slug
-
allwords
Generate all the words over a given alphabet
-
nakadi-types
A connector for the Nakadi Event Broker
-
lemmeknow
Identify any mysterious text or analyze strings from a file
-
cutters
Rule based sentence segmentation library
-
untanglr
Probabilistically split concatenated words using NLP based on English Wikipedia unigram frequencies
-
suffix
Suffix arrays
-
glyph_brush_layout
Text layout for ab_glyph
-
utf8-bufread
Provides alternatives to BufRead’s read_line & lines that stop not on newlines
-
gistit
Quick and easy code snippet sharing
-
arbitrator
Format text based on a set of rules and regexes
-
texting
string helpers
-
wcounter
Give the word and count the appearance
-
branchy
Provides tools for generating strings and sequences using context-free grammars
-
colonnade
format tabular data for display
-
ascii-hangman-backend
customizable Hangman game with ASCII-art rewarding for children (backend)
-
entities
Provides the raw data needed to convert to and from HTML entities
-
mdbook-kroki-preprocessor
render kroki diagrams from files or code blocks in mdbook
-
text2num
Parse and convert numbers written in English, Spanish or French into their digit representation
-
slicestring
slicestring is a crate for slicing Strings
-
ansi-str
A library which provides a set of methods to work with ANSI strings
-
uniwhat
Display the unicode characters text
-
blob-uuid
Converts Uuid to a url friendly 22 character string blob
-
fast2s
A fast Traditional Chinese to Simplified Chinese conversion library. Built with FST, faster than most of other libraries.
-
ansi-to-tui
A library to convert ansi color coded text into tui::text::Text type from tui-rs library
-
unicode-joining-type
Fast lookup of the Unicode Joining Type and Joining Group properties
-
ascii
ASCII-only equivalents to
char
,str
andString
-
lexical-sort
Sort Unicode strings lexically
-
mdbook-theme
A preprocessor and a backend to config theme for mdbook, especially creating a pagetoc on the right and setting full color themes from the offical ace editor
-
fancy-regex-fork-pb
A custom fork of the fancy-regex crate. You probably don’t want to use this.
-
mdplayscript
An extension of Markdown for play scripts
-
latex
An ergonomic library for programatically generating LaTeX documents and reports
-
porter-stemmer
Flexible and unicode friendly, Porter stemmer implementation
-
sixbit
Small packed strings
-
crowbook
Render a Markdown book in HTML, PDF or Epub
-
mktoc
Generate Table of Contents from Markdown files
-
beemovie
Bee Movie crate
-
ferris-says
A Rust flavored replacement for the classic cowsay
-
justify
Justify plaintext while handling Unicode gracefully
-
transition-table
transition table utilities for keyword parser
-
charabia
A simple library to detect the language, tokenize the text and normalize the tokens
-
egui_commonmark
Commonmark viewer for egui
-
linkcheck
A library for extracting and validating links
-
aki-xcat
concatenate files that are plain, gzip, xz and zstd
-
ucd
Extends the char type to provide access to most fields of the UCD, Unicode Character Database, as of version 9.0.0. It aims to be compact, fast, and use minimal dependencies (only rust’s core crate)…
-
wordcut-engine
Word segmentation/breaking library
-
markdown
Native Rust library for parsing Markdown and (outputting HTML)
-
fontconfig-rs
Safe, higher-level wrapper around the fontconfig library
-
onepage
A simple static site generator
-
ucd-util
A small utility library for working with the Unicode character database
-
pinot
Fast, high-fidelity OpenType parser
-
mandown
Markdown to groff (man page) converter
-
rmw-utf8
Short text compression algorithm for utf-8 (optimized for Chinese , developed based on rust programming language). 面向utf-8的短文本压缩算法(为中文压缩优化,基于rust编程语言开发)。
-
thesauromatic
thesauromatic is a command-line thesaurus that returns related words when given a word. The output words are one per line, making it easy to process in shell pipelines.
-
easy_io
Fast and dead-simple IO for competitive programming in Rust
-
character_frequency
Simple library for counting character frequencies in a string concurrently
-
markov
A generic markov chain implementation in Rust
-
markdown-gen
Crate for generating Markdown files
-
pulldown-cmark-fork
A pull parser for CommonMark
-
tagsearch
Filter plaintext files based on @keyword tags
-
cur
The tool that will hunt for your regular expression
-
spandex-hyphenation
Knuth-Liang hyphenation for a variety of languages
-
charfind
CharFind is an application for finding Unicode characters
-
notmecab
Library for tokenizing text with mecab dictionaries. Not a mecab wrapper.
-
chinese_segmenter
Tokenize Chinese sentences using a dictionary-driven largest first matching approach
-
emoji-printer
Replace emoji shortcodes in string with emoji unicode (”🍣” -> 🍣)
-
madato
A library and command line tool for working tabular data (XLS, ODS, CSV, YAML), and Markdown
-
eliza
A rust implementation of ELIZA - a natural language processing program developed by Joseph Weizenbaum in 1966
-
rulex-macro
Macro for converting rulex expressions to regexes
-
pandoc
a library API that wraps calls to the pandoc 2.x executable
-
notedown_ast
Notedown Abstract Syntax Tree
-
abjad
Calculate the numerical abjad value of Arabic-script text
-
norad
Read and write Unified Font Object files
-
ttaw
talking to a wall, a piecemeal natural language processing library
-
recase
Changes the convention case of input text
-
string-overlap
A helper crate for “layering” ASCII art
-
precis-core
PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols as defined in rfc8264
-
cskk
C ABIから使う事を目的とした SKK(Simple Kana Kanji henkan)方式のかな漢字変換ライブラリ
-
tet_rs
A third-party implementation of Text Entry Throughput (ref. https://doi.org/10.1145/3290605.3300866) for Rust
-
matchers
Regex matching on character and byte streams
-
yozuk-sdk
Types used in the Yozuk ecosystem
-
agram
An offline anagram library
-
ryaspeller
A tool and lib for searching typos in text, files and websites
-
owned_chars
Owned iterators with the same output as Chars and CharIndices
-
genpdf
User-friendly PDF generator written in pure Rust
-
csv2jsonl
Converts CSV to JSON Lines
-
vaporetto_rules
Rule-base filters for Vaporetto
-
geml
A simple Generator-orientated ML parser
-
mdbook-svgbob2
Alternative mdbook preprocessor for svgbob
-
cow-utils
Copy-on-write string utilities for Rust
-
unic-ucd-segment
UNIC — Unicode Character Database — Segmentation Properties
-
unicode-reverse
Unicode-aware in-place string reversal
-
single_source
Generate code files from snippets in md tutorial files
-
mdbook-indexing
mdbook preprocessor for index generation
-
regex_generate
Use regular expressions to generate text
-
sejong
Sejong Buffer is a buffer that can receive ASCII bytes different from keyboard and send out UTF-32 Hangul string. This buffer allows deletion by Jamo.
-
pangu
Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width characters (alphabetical letters, numerical digits and symbols)
-
valid_rust_char
A tiny library to check if a char it’s valid in a rust file
-
generator-combinator
Composes combinators to generate patterns of increasing complexity
-
fuzzywuzzy
A pure-Rust clone of the incredibly useful fuzzy string matching python package, FuzzyWuzzy
-
indent_tokenizer
Generate tokens based on indentation
-
pandoc-ac
A simple pandoc filter for converting acronym codes to LaTeX
-
libhanzzok
Hanzzok compiler library
-
unescape
Unescapes strings with escape sequences written out as literal characters
-
changecase
A trait and implementation for changing the case of Strings and &str. It currently supports uppercase, lowercase, alternating case, and inverting case. Title case is in the works.
-
scripter
A screenplay compiler
-
doccy
Doccy is a simple brace based markup language
-
inflections
High performance inflection transformation library for changing properties of words like the case
-
verba
A library for working with Latin words
-
lindera-ipadic
A Japanese morphological dictionary for IPADIC
-
hypher
hypher separates words into syllables
-
bos_books_codes
A library that handles 3-character Bible Books Codes
-
ironstorm_lookup
Lightning fast lookup table for auto completion, type ahead, suggestion engines
-
epub
Library to support the reading of epub files
-
notegraf
Core library for building a graph-oriented notebook
-
case_insensitive_hashmap
A HashMap that uses case-insensitive strings as keys
-
readability
Port of arc90’s readability project to rust
-
stylish-stringlike
API for string-like objects that have styles applied
-
pragmatic-segmenter
Rust port of pySBD v3.1.0
-
nfa_regex
Simple NFA regex engine for text processing
-
csmlinterpreter
The CSML (Conversational Standard Meta Language) is a Domain-Specific Language developed for creating conversational experiences easily
-
nlprule-build
Build tools for a fast, low-resource Natural Language Processing and Error Correction library
-
igo-rs
Pure Rust port of the Igo, a POS(Part-Of-Speech) tagger for Japanese (日本語 形態素解析)
-
words-count
Count the words and characters, with or without whitespaces
-
character-stream
Helper data structures for reading UTF-8 characters from a stream
-
unidok
A powerful, readable, easy-to-learn markup language
-
naming_lib
Library for identifying and converting identifiers naming format (case | notation)
-
byte_string
Wrapper types for outputting byte strings (b”Hello”) using the Debug ({:?}) format
-
kana-converter
A simple converter for half-width/full-width Japanese language characters (katakana, hiragana, and ASCII)
-
goya-ipadic
IPA dictionary for Goya
-
esperanto-text
Convert Esperanto text between UTF-8, x-system and h-system transliterations
-
swot
Swot is a community-driven or crowdsourced library for verifying that domain names and email addresses are tied to a legitimate university of college
-
whisperer
将文本编码为简短的中文字符, 防和谐
-
indentation
Indentation Formatter
-
quoted-string-parser
Quoted string parser for grammar defined in RFC3261
-
portmanteau
A library to create portmanteaux
-
word_filter
A Word Filter for filtering text
-
ansi-to-tui-forked
A library to convert ansi color coded text into tui::text::Text type from tui-rs library
-
runiq-lib
An efficient way to filter duplicate lines from input, à la uniq
-
cabocha
Safe Rust wrapper for cabocha a japanese language dependency structure analyzer library
-
fax
Decoder and Encoder for CCITT Group 3 and 4 bi-level image encodings used by fax machines TIFF and PDF
-
resast
Rusty-ECMAScript Abstract Syntax Tree
-
human_language_toolkit_chatbot
NLTK like chatbot made with pure rust
-
math-text-transform
Transform greek letters, latin letters, or decimal digits into certain variants from the mathematical alphanumeric symbols Unicode block (U+1D400–U+1D7FF). For example to bold, italic, script or double-struck.
-
detchar
Command line tool for detecting file encodings
-
mdbook-pagetoc
A mdbook plugin that provides a table of contents for each page
-
rexpaint
This crate provides functionality for reading and writing .xp files of the Grid Sage Games REXPaint ASCII art editor
-
hyperscan-sys
Hyperscan bindings for Rust with Multiple Pattern and Streaming Scan
-
whitespace_text_steganography
A steganography strategy that uses whitespace to hide text in other text
-
nws-product-list
NWS Product List
-
unicode_names
Map characters to and from their name given in the Unicode standard. This goes to great lengths to be as efficient as possible in both time and space, with the full bidirectional tables weighing barely 500 KB…
-
rupantor
A Bengali Phonetic Parser which is very flexible and supports Avro Phonetic
-
mathematica-notebook-filter
mathematica-notebook-filter
parses Mathematica notebook files and strips them of superfluous information so that they can be committed into version control systems more easily -
slicedisplay
Simplistic Display implementation for Vecs and slices
-
blockcounter
Counts the blocks in a stream
-
layered-nlp
Highly-flexible data-oriented NLP framework
-
wtf8-rs
Implementation of the WTF-8 encoding
-
ascii-canvas
simple canvas for drawing lines and styled text and emitting to the terminal
-
wkhtmltopdf
High-level bindings to wkhtmltopdf
-
snakecase
Snakecase is a general purpose snakecase implementation supporting both ascii and unicode
-
fifthtry-mdbook
fork of mdbook, only for ft-cli
-
nix-base32
Provides a nix (as in NixOS) compatible base32 encoding
-
stringutils
A collection of various and (hopefully) useful String utility functions
-
quick-doc-viewer
A quick documentation viewer for developers to preview documentations
-
ansi-cut
A library for cutting a string while preserving colors
-
fingers
a finger client library
-
shoebill
A Wadler/Leijen style pretty-printer
-
yagenerator
Application that uses tinytemplate engine to generate text files. If you have a set of structured data, and need to generated a bunch of arbitrary types of files from it, this tool can help you to save some time.
-
ogma
Ogma DSL builder
-
yozuk-core-skillset
Set of default Yozuk skills
-
ascii_utils
Utilities to handle ASCII characters
-
cyrconv
A funny faux cyrillic character mapper
-
scie
Scie is a research about how to build simple code identify engine for different languages
-
const-str-proc-macro
compile-time string operations
-
lazy-transform-str
Lazy-copying lazy-allocated scanning
str
transformations. This is good e.g. for (un)escaping text, especially if individual strings are short. -
tabwriter
Elastic tabstops
-
color-convert
Support RGB,RGBA,HEX,HSL,HSLA,HSV,CMYK to convert each other, write by rust
-
leftpad-rs
Rust implementation of the Go Leftpad package
-
naming_clt
Extract and convert the naming format(case|notation) of identifiers from files or stdin. Use this tool to prepare identifier name strings for further operations (matching,replacing…) on relative files
-
truncate_string_at_whitespace
Truncate a &str at the closest whitespace to a specified length with unicode safety
-
maybe_utf8
Byte container optionally encoded as UTF-8
-
moscato
Outline scaler for OpenType glyphs
-
parattice
Recursive paraphrase lattice generator
-
japhonex
Japanese phone number checker for Rust
-
trashy-xml
Xml parser that does not stop parsing when encountering errors
-
chinese-ner
A CRF based Chinese Named-entity Recognition Library written in Rust
-
xhtmlchardet
Character set detection for XML and HTML
-
cfasttext-sys
fastText ffi binding
-
august
A crate & program for converting HTML to plain text
-
lingua-latvian-language-model
The Latvian language model for Lingua, an accurate natural language detection library
-
typed-dialogflow
An easy-to-use typed Google Dialogflow client
-
mdzk
Plain text Zettelkasten based on mdBook
-
case-conv
Faster case conversion crate
-
rosie
Interface for the Rosie Pattern Language, for efficient and maintainable text pattern matching and search
-
mediawiki_parser
A strict parser for MediaWiki markdown
-
imperative
Check for imperative mood in text
-
aki-mcycle
mark up text with cycling color
-
str_overlap
Methods for finding the overlap between two string slices
-
moenarchbook
Creates a book from markdown files
-
lindera-dictionary
A Japanese morphological dictionary
-
unicode-canonical-combining-class
Fast lookup of the Canonical Combining Class property
-
josa
Korean language josa selector
-
gistit-daemon
Gistit daemon used for p2p file sharing
-
shingles
Shingles implementation in rust
-
basic-text-internals
Basic Text string literal implementation details
-
text-utils
Text utils for unescaping and align
-
apriori_pattern_miner
Implementation of Apriori Pattern Mining algorithm
-
file-size
a function formatting file sizes in 4 chars
-
pdf_forms
A library for programatically filling out pdf forms
-
sesdiff
Generates a shortest edit script (Myers’ diff algorithm) to indicate how to get from the strings in column A to the strings in column B. Also provides the edit distance (levenshtein).
-
cang-jie
A Chinese tokenizer for tantivy
-
gen-epub-book
Generate an ePub book from a simple plaintext descriptor
-
tex
The νTeX typesetting engine
-
unicode-case-mapping
Fast lowercase, uppercase, and titlecase mapping for characters
-
precis-tools
Tools and parsers to generate PRECIS tables from the Unicode Character Database (UCD)
-
cw
Count Words, a fast wc clone
-
emojito
Find all the Emoji in a string. Supports composed emoji.
-
nib
A yet another static site generator
-
sublime_fuzzy
Fuzzy matching algorithm based on Sublime Text’s string search
-
vtext
NLP with Rust
-
unicode_clusters
This crate provides variable width unicode characters as single items, allowing for array like indexing etc
-
contractions
Contractions is a rust library to expand contractions in English
-
transliterate1234
UTF-8 to ASCII transliteration
-
ogrep
Tool for searching in indentation-structured texts
-
unicode-jp
A library to convert Japanese Half-width-kana[半角カナ] and Wide-alphanumeric[全角英数] into normal ones
-
unicode-character-database
Unicode character database tables (Unicode Standard Annex #44) generated using
ucd-generate
-
strcursor
Provides a string cursor type for seeking through a string whilst respecting grapheme cluster and code point boundaries
-
kaolinite
A crate to assist in the creation of TUI text editors
-
guarding_parser
Guarding is a guardians for code, architecture, layered. Guarding crate a architecture aguard DSL which based on ArchUnit.
-
gskkserv
skkserv using Google IME
-
progress
Library for showing text based progress bar and job status
-
names-changer
Convert a names of sql schemes from camelcase to snake case
-
pcre2
High level wrapper library for PCRE2
-
poetry-book
Create a poetry book in latex, starting from plain text
-
translation-api-cn
Some useful structs for calling Chinese translation api cloud services. A helper tool for
bilingual
cmdline tool. -
encoding_c
C API for encoding_rs
-
songww-harfbuzz-rs
Rust bindings to the HarfBuzz text shaping engine
-
stopwords
Stopwords from popular text processing frameworks
-
ucd-parse
A library for parsing data files in the Unicode character database
-
mdbook-bib
mdbook plugin allowing to load and present a bibliography in BibLaTex format in your books and cite its references
-
charwise
This lightweight, dependency-free rust library provides a convenient way to read characters from different resources
-
dcsv
Dyanmic csv reader,writer,editor
-
unic-emoji-char
UNIC — Unicode Emoji — Emoji Character Properties
-
mdbook-translation
A utility to prepare multi-lingual mdBook books
-
pretok
A string pre-tokenizer for C-like syntaxes
-
boxy
Declarative builder for Unicode box-drawing characters
-
html2runes
An HTML to Text converter
-
censor
A simple text profanity filter
-
sana
Create lexers easily
-
harfbuzz
Rust bindings to the HarfBuzz text shaping engine
-
niqqud
A lightweight library for removing hebrew diacritics
-
zhconv-cli
Convert Traditional/Simplified Chinese and regional words of Taiwan/Hong Kong/mainland China/Singapore based on Wikipedia conversion tables 轉換中文簡體、繁體及兩岸、新馬地區詞,基於中文維基轉換…
-
emoj_rs
rust implementation of emoj
-
forgiving-htmlescape
A library for HTML entity encoding and decoding, with support for leaving malformed entities intact
-
ascii-hangman-webapp
customizable Hangman game with ASCII-art rewarding for children (webapp version)
-
charname
Incredibly simple library that just gives you the Unicode name for a character
-
text-diff
A Rust text diffing and assertion library
-
gistit-ipc
Inter process communication for gistit-cli and gistit-daemon
-
yozuk-helper-preprocessor
Preprocessor utilities for Yozuk
-
mdtranslation-cli
Command-line tools for using mdTranslation, which can be used to prepare multi-lingual Markdown documents
-
zhconv
Convert Traditional/Simplified Chinese and regional words of Taiwan/Hong Kong/mainland China/Singapore based on Wikipedia conversion tables 轉換中文簡體、繁體及兩岸、新馬地區詞,基於中文維基之字…
-
bgrep
bgrep is a grep tailored to handle binary patterns and files
-
aki-xtee
copy standard input to each files and standard output
-
pomsky-macro
Macro for converting pomsky expressions to regexes
-
vaporetto_tantivy
Vaporetto Tokenizer for Tantivy
-
yeslogic-fontconfig-sys
Raw bindings to Fontconfig without a vendored C library
-
mdbook-footnote
mdbook preprocessor for footnotes
-
text-tables
A terminal/text table prettifier with no dependencies
-
nlpo3-cli
Command line interface for nlpO3, a Thai natural language processing library
-
bitap
Bitap implementation in rust
-
mdbook-presentation-preprocessor
A preprocessor for utilizing an MDBook as slides for a presentation
-
bitfont
Takes an ASCII string and generates a vector containing a bitmap font, for easy overlay into images
-
lindera-decompress
A morphological analysis library
-
layered-amount
Amount plugin for layered-nlp
-
corpus-preproc
A preprocessor for text and HTML corpora
-
yeslogic-unicode-script
Fast lookup of the Unicode Script property
-
unflow
Unflow is a DSL to convert design to code
-
lindera-core
A morphological analysis library
-
esc
Escape characters in strings
-
basic-text-literals
Basic Text string literal macro for basic-text
-
gimme
Pull useful data out of your clipboard
-
lingua-finnish-language-model
The Finnish language model for Lingua, an accurate natural language detection library
-
mdbook-variables
mdBook proprocessor for risolve variables configured from book.toml
-
chinese_detection
Classify a string as either English, Chinese, or Pinyin
-
sudachiclone
sudachiclone-rs is a Rust version of Sudachi, a Japanese morphological analyzer
-
ccase
Command line interface to convert strings into any case
-
indentation_flattener
From indented input, generate plain output with indentation PUSH and POP codes
-
unic-ucd-case
UNIC — Unicode Character Database — Case Properties
-
yozuk-helper-english
English NLP utilities for Yozuk
-
mdbook-latex
An mdbook backend for generating LaTeX and PDF documents
-
emojicon
Find Emoji by using Emoticons and GitHub’s, Bengali emoji names
-
deepfrog
A deep learning NLP suite (PoS,lemmatiser,NER) with FoLiA XML support
-
guarding_core
Guarding is a guardians for code, architecture, layered. Guarding crate a architecture aguard DSL which based on ArchUnit.
-
gistit-project
Gistit project definitions
-
textwrap-macros-impl
Simple procedural macros to use textwrap utilities at compile time
-
utf8_iter
Iterator by char over potentially-invalid UTF-8 in &[u8]
-
ced
Dead easy csv editor
-
aki-mline
match line, regex text filter like a grep of linux command
-
te
A really simple, stripped down & readable regular expression alternative for matching text
-
emote_mapper
Maps emote names to their respective emoji using a csv map
-
svgbob_server
Transform your ascii diagrams into happy little SVG
-
uniaxe
A Rust crate to replace Unicode letters with Ascii equivalents
-
wkhtmltox-sys
FFI bindings to wkhtmltox
-
lindera-unidic
A Japanese morphological dictionary for UniDic
-
khatson
Attacut ported Thai word segmentation/breaking command line
-
mdtranslation
A utility to prepare multi-lingual Markdown documents
-
sana_core
The core of Sana
-
recode_rs
Command-line tool for converting between the character encodings defined in the Encoding Standard
-
yozuk-model
NLP model generator for Yozuk
-
aki-txpr-macro
the more easy to use libaki-*
-
lindera-compress
A morphological analysis library
-
layered-clauses
Clauses plugin for layered-nlp
-
aki-gsub
substitude text command, replace via regex
-
unic-idna-mapping
UNIC — IDNA — IDNA Mapping Table
-
lingua-telugu-language-model
The Telugu language model for Lingua, an accurate natural language detection library
-
unic-segment
UNIC — Unicode Text Segmentation Algorithms
-
lindera-cc-cedict
A Japanese morphological dictionary for CC-CEDICT
-
yozuk-helper-platform
Platform-dependent utilities for Yozuk
-
yeslogic-ucd-parse
A library for parsing data files in the Unicode character database
-
aki-stats
output the statistics of text, like a wc of linux command
-
lindera-ko-dic
A Japanese morphological dictionary for ko-dic
-
cleanse
Small utility to clean up delimited (TSV/CSV) data
-
lingua-macedonian-language-model
The Macedonian language model for Lingua, an accurate natural language detection library
-
yozuk-helper-filetype
Filetype detection for Yozuk
-
suffix_tree
Suffix trees
-
encoding_c_mem
C API for encoding_rs::mem
-
aki-unbody
output first or last n lines, like a head and tail of linux command
-
assert-text
the testing macro tools
-
glyph-names
Mapping of characters to glyph names according to the Adobe Glyph List Specification
-
aki-json-pick
The json pick out command
-
unic-ucd-version
UNIC — Unicode Character Database — Version
-
unic-char-property
UNIC — Unicode Character Tools — Character Property taxonomy, contracts and build macros
-
unic-char-range
UNIC — Unicode Character Tools — Character Range and Iteration
-
lingua-turkish-language-model
The Turkish language model for Lingua, an accurate natural language detection library
-
utf16_iter
Iterator by char over potentially-invalid UTF-16 in &[u16]
-
unic-common
UNIC — Common Utilities
-
yozuk-bundle
Prebuild NLP model for Yozuk
-
xmldecl
Extracts an encoding from an ASCII-based bogo-XML declaration in text/html in a Web-compatible way
-
yeslogic-fontconfig
RENAMED: use the fontconfig crate instead
-
lingua-korean-language-model
The Korean language model for Lingua, an accurate natural language detection library
-
lingua-xhosa-language-model
The Xhosa language model for Lingua, an accurate natural language detection library
-
unic
UNIC: Unicode and Internationalization Crates
-
lingua-english-language-model
The English language model for Lingua, an accurate natural language detection library
-
unic-char
UNIC — Unicode Character Tools
-
lingua-french-language-model
The French language model for Lingua, an accurate natural language detection library
-
unic-cli
UNIC Command-Line Tools
-
lingua-swahili-language-model
The Swahili language model for Lingua, an accurate natural language detection library
-
unic-emoji
UNIC — Unicode Emoji
-
unic-ucd-core
UNIC - Unicode Character Database - Version
-
lingua-kazakh-language-model
The Kazakh language model for Lingua, an accurate natural language detection library
-
lingua-shona-language-model
The Shona language model for Lingua, an accurate natural language detection library
-
lingua-japanese-language-model
The Japanese language model for Lingua, an accurate natural language detection library
-
lingua-welsh-language-model
The Welsh language model for Lingua, an accurate natural language detection library
-
lingua-lithuanian-language-model
The Lithuanian language model for Lingua, an accurate natural language detection library
-
lingua-spanish-language-model
The Spanish language model for Lingua, an accurate natural language detection library
-
lingua-portuguese-language-model
The Portuguese language model for Lingua, an accurate natural language detection library
-
lingua-italian-language-model
The Italian language model for Lingua, an accurate natural language detection library
-
lingua-polish-language-model
The Polish language model for Lingua, an accurate natural language detection library
-
lingua-dutch-language-model
The Dutch language model for Lingua, an accurate natural language detection library
-
lingua-german-language-model
The German language model for Lingua, an accurate natural language detection library
-
lingua-czech-language-model
The Czech language model for Lingua, an accurate natural language detection library
-
lingua-ukrainian-language-model
The Ukrainian language model for Lingua, an accurate natural language detection library
-
lingua-esperanto-language-model
The Esperanto language model for Lingua, an accurate natural language detection library
-
lingua-hungarian-language-model
The Hungarian language model for Lingua, an accurate natural language detection library