1 unstable release

0.0.0 Jul 22, 2021

#355 in #nlp

MIT license

2KB

bistring

Build status Documentation status

The bistring library provides non-destructive versions of common string processing operations like normalization, case folding, and find/replace. Each bistring remembers the original string, and how its substrings map to substrings of the modified version.

For example:

>>> from bistring import bistr
>>> s = bistr('π•Ώπ–π–Š π––π–šπ–Žπ–ˆπ–, π–‡π–—π–”π–œπ–“ 🦊 π–π–šπ–’π–•π–˜ π–”π–›π–Šπ–— π–™π–π–Š π–‘π–†π–Ÿπ–ž 🐢')
>>> s = s.normalize('NFKD')     # Unicode normalization
>>> s = s.casefold()            # Case-insensitivity
>>> s = s.replace('🦊', 'fox')  # Replace emoji with text
>>> s = s.replace('🐢', 'dog')
>>> s = s.sub(r'[^\w\s]+', '')  # Strip everything but letters and spaces
>>> s = s[:19]                  # Extract a substring
>>> s.modified                  # The modified substring, after changes
'the quick brown fox'
>>> s.original                  # The original substring, before changes
'π•Ώπ–π–Š π––π–šπ–Žπ–ˆπ–, π–‡π–—π–”π–œπ–“ 🦊'

Languages

PyPI version npm version

bistring is available in multiple languages, currently Python and JavaScript/TypeScript. Ports to other languages are planned for the near future.

The code is structured similarly in each language to make it easy to share algorithms, tests, and fixes between them. The main differences come from trying to mirror the language's built-in string API. If you want to contribute a bug fix or a new feature, feel free to implement it in any one of the supported languages, and we'll try to port it to the rest of them.

Demo

Click here for a live demo of the bistring library in your browser.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

No runtime deps