1 stable release
1.0.4 | Jul 31, 2023 |
---|
#1075 in Machine learning
560KB
404 lines
Underthesea Core
Underthesea Core is a powerful extension of the popular natural language processing library Underthesea, which includes a range of efficient data preprocessing tools and machine learning models for training. Built with Rust for optimal performance, Underthesea Core offers fast processing speeds and is easy to implement, with Python bindings for seamless integration into existing projects. This extension is an essential tool for developers looking to build high-performance NLP systems that deliver accurate and reliable results.
Usage
CRFFeaturizer
>>> from underthesea_core import CRFFeaturizer
>>> features = ["T[-1]", "T[0]", "T[1]"]
>>> dictionary = set(["sinh viên"])
>>> featurizer = CRFFeaturizer(features, dictionary)
>>> sentences = [[["sinh", "X"], ["viên", "X"], ["đi", "X"], ["học", "X"]]]
>>> featurizer.process(sentences)
[[['T[-1]=BOS', 'T[0]=sinh', 'T[1]=viên'],
['T[-1]=sinh', 'T[0]=viên', 'T[1]=đi'],
['T[-1]=viên', 'T[0]=đi', 'T[1]=học'],
['T[-1]=đi', 'T[0]=học', 'T[1]=EOS']]]
Release Workflow
- Change version in
Cargo.toml
andpyproject.toml
- Push to branch
core
with commitPublish Underthesea Core
- This will trigger
release-pypi-core
action
- Check latest version in pypi
Note*: Run a self-hosted for building macos-arm
Dependencies
~11MB
~216K SLoC