11 releases
0.0.10 | Oct 17, 2024 |
---|---|
0.0.9 | Oct 16, 2024 |
#1005 in Text processing
64 downloads per month
Used in misanthropic
100KB
2K
SLoC
langsan
is a sanitization library for language models
Out of a desire to be first to market, many companies from OpenAI to Anthropic are releasing language models without proper input or output sanitization. This can lead to a variety of safety and security issues, including but not limited to human-invisible adversarial attacks, data leakage, and generation of harmful content.
langsan
provides immutable string wrappers guaranteeing their contents are within restricted unicode ranges, generally those only officially supported by a particular language model. Almost all unicode code blocks are available as features (crates.io has a limit set at 300).
Dependencies
~0.3–1MB
~22K SLoC