1 unstable release
0.2.0 | Nov 11, 2023 |
---|
#1926 in Text processing
6KB
62 lines
Description
Simple parser of English sentences created for KMA Rust course. Parser can identify single words, numbers, punctuation symbols, whitespaces, sentences and whole text.
Usage
make run ARGS="-f test_files/test1.txt"
Output:
["Hello", ",", " ", "world", "!"]
Or to get help information:
make
Techical
Parser uses peg
library. Rules:
word()
rule is used to parse words that contain only alphabetical symbolsnumber()
rule is used to parse numbersend_punctuation()
rule is used to parse punctuation marks that can end a sentence:... | . | ! | ?
other_punctuation()
rule is used to parse punctuation marks that can be inside a sentence:, | ; | : | -
whitespace()
rule is used to parse spaces or other identation symbols like'\t' | '\n' | '\r'
sentence()
rule is used to parse the whole sentence. It uses all three previous rules to parse the input string. Sentence must end in anend_punctuation
text()
rule can parse multiple sentences
Dependencies
~1MB
~17K SLoC