#sentence #english #parser #course #created #white-space #numbers

bin+lib kma-rustlang-vadym-polishchuk-english-parser

Simple parser of English sentences created for KMA Rust course

1 unstable release

0.2.0 Nov 11, 2023

#1580 in Text processing

MIT license

6KB
62 lines

Description

Simple parser of English sentences created for KMA Rust course. Parser can identify single words, numbers, punctuation symbols, whitespaces, sentences and whole text.

Usage

make run ARGS="-f test_files/test1.txt"

Output:

["Hello", ",", " ", "world", "!"]

Or to get help information:

make

Techical

Parser uses peg library. Rules:

  • word() rule is used to parse words that contain only alphabetical symbols
  • number() rule is used to parse numbers
  • end_punctuation() rule is used to parse punctuation marks that can end a sentence: ... | . | ! | ?
  • other_punctuation() rule is used to parse punctuation marks that can be inside a sentence: , | ; | : | -
  • whitespace() rule is used to parse spaces or other identation symbols like '\t' | '\n' | '\r'
  • sentence() rule is used to parse the whole sentence. It uses all three previous rules to parse the input string. Sentence must end in an end_punctuation
  • text() rule can parse multiple sentences

Dependencies

~1MB
~18K SLoC