6 releases (1 stable)
1.0.0 | Oct 27, 2024 |
---|---|
0.1.4 | Oct 22, 2024 |
0.1.2 | May 24, 2022 |
#197 in Data structures
311 downloads per month
36KB
645 lines
Installation
[dependencies]
parsable = "0.1"
Example
Implementation of a basic operation interpreter that only works with positive integer and without operator priorities.
use parsable::{parsable, Parsable, ParseOptions};
#[parsable]
enum Operator {
Plus = "+",
Minus = "-",
Mult = "*",
Div = "/",
Mod = "%"
}
#[parsable]
struct NumberLiteral {
#[parsable(regex=r"\d+")]
value: String
}
impl NumberLiteral {
fn process(&self) -> i32 {
self.value.parse().unwrap()
}
}
#[parsable]
enum Operand {
Number(NumberLiteral),
Wrapped(WrappedOperation)
}
impl Operand {
fn process(&self) -> i32 {
match self {
Operand::Number(number) => number.process(),
Operand::Wrapped(wrapped) => wrapped.process(),
}
}
}
#[parsable]
struct Operation {
first_operand: Operand,
other_operands: Vec<(Operator, Operand)>
}
impl Operation {
fn process(&self) -> i32 {
let mut result = self.first_operand.process();
for (operator, operand) in &self.other_operands {
let value = operand.process();
result = match operator {
Operator::Plus => result + value,
Operator::Minus => result - value,
Operator::Mult => result * value,
Operator::Div => result / value,
Operator::Mod => result % value,
}
}
result
}
}
#[parsable]
struct WrappedOperation {
#[parsable(brackets="()")]
operation: Box<Operation>
}
impl WrappedOperation {
fn process(&self) -> i32 {
self.operation.process()
}
}
fn main() {
let operation_string = "3 + (4 * 5)".to_string();
let parse_options = ParseOptions::default();
match Operation::parse(operation_string, parse_options) {
Ok(operation) => {
println!("result: {}", operation.process());
},
Err(error) => {
dbg!(error);
}
}
}
The #[parsable]
macro
Tagging a struct or enum with the #[parsable]
macro implements the Parsable
trait for the item, with the condition that all fields must also implement the Parsable
trait.
It can also be applied on a field to tweak the way it is parsed.
Struct
- All fields are parsed one after the other. The parsing is only successful if all fields are succesfully parsed.
Enum
- The parsing stops on the first variant that is successfully parsed.
- If a variant contains multiple fields, they are parsed successively and must all be successful for the variant to be matched.
- If a variant contains no field, a string must be specified to indicate how to parse it.
#[parsable]
enum MyOperation {
BinaryOperation(NumerLiteral, Operator, NumerLiteral),
Number(NumberLiteral),
Zero = "zero"
}
// If the first two variants are swapped, the parsing will never reach the `BinaryOperation` variant.
Builtin types
String
A string field must be tagged with the #[parsable(regex="<pattern>")]
or #[parsable(value="<string>")]
macro option to specify how to parse it.
// Matches at least one digit
#[parsable]
struct NumberLiteral {
#[parsable(regex=r"\d+")]
value: String
}
#[parsable]
// Only matches the string "+"
struct PlusSign {
#[parsable(value="+")]
value: String
}
Option<T>
Matches T
. If it fails, returns None
but the parsing of the field is still considered successful.
#[parsable]
enum Sign {
Plus = "+",
Minus = "-"
}
// Matches a number with an optional sign.
#[parsable]
struct NumberLiteral {
sign: Option<Sign>,
#[parsable(regex=r"\d+")]
value: String
}
Vec<T>
Matches as many T
as possible successively. The following options can be specified:
min=X
: the parsing is only valid if at least X items are parsedseparator=<string>
: after each item, the parser will attempt to consume the separator. The parsing fails if no separator is found.
// Matches a non-empty list of numbers separated by a comma
#[parsable]
struct NumberList {
#[parsable(separator=",", min=1)]
numbers: Vec<NumberLiteral>
}
Other types
()
: matches nothing, is always successful.(T, U)
: matchesT
, thenU
.Box<T>
: matchesT
.
Running the parser
The Parsable
trait provides the parse()
method that takes two arguments:
content: String
: the string to parseoptions: ParseOptions
: parse options
The ParseOptions
type has the following fields:
comment_start: Option<&'static str>
: when the specified pattern is matched, the rest of the line is ignored. Common instances are"//"
or"#"
.file_path: Option<String>
: file path of the string being parsed.package_root_path: Option<String>
: root path of package or module containing the file being parsed.
The file_path
and package_root_path
fields are forwarded to the FileInfo
struct and are never actually used by the library.
Blank characters (spaces, new lines and tabulations) are always ignored during parsing.
FileInfo
The FileInfo
structure is used accross the library. It has the following fields:
content: String
: the string being parsedpath: String
: the path of the file being parsed, as specified inParseOptions
package_root_path: String
: the path of the package containing the file, as specified inParseOptions
It also provides the following methods:
get_line_col(index: usize) -> Option<(usize, usize)>
: returns the line and column numbers (starting at 1) associated with the specified character index. This method assumes 1 character per byte and therefore does not work properly when the file contains non-ascii characters.
ItemLocation
Tagging a struct with #[parsable]
adds a location
field of type ItemLocation
with the following fields & methods:
file: Rc<FileInfo>
: information on the file containing the itemstart: usize
: starting index of the item in the fileend: usize
: ending index of the item in the fileget_start_line_col() -> (usize, usize)
: get the line and column numbers (starting at 1) of the location start
The Parsable
also trait provides a location()
method:
- on a structure, it returns its
location
field - on an enum, it returns the
location()
method of the variant that was matched - calling
location()
on a variant with no field panics
A way to prevent the panic is to wrap enums with unit variants in a structure:
#[parsable]
enum Operator {
Plus = "+",
Minus = "-",
Mult = "*",
Div = "/",
Mod = "%"
}
#[parsable]
struct WrappedOperator {
operator: Operator
}
fn main() {
let string = "+".to_string();
let options = ParseOptions::default();
let result = WrappedOperator::parse(string, options).unwrap();
dbg!(result.location()); // It works!
}
ParseError
On failure, Parsable::parse()
returns Err(ParseError)
. This structure has the following fields:
file: Rc<FileInfo>
: the file where the error occured.index: usize
: the index at which the error occured.expected: Vec<String>
: a list of item names that where expected at this index.
Macro options
Root attributes
located=<bool>
: on a structure, indicates whether or not thelocation
field should be generated. Default:true
.cascade=<bool>
: iftrue
on a structure, indicates that if anOption
field is not matched, then the parser should not attempt to match otherOption
fields. It does not invalidate the overall struct parsing. Default:false
.name=<string>
: indicates the name of the struct or enum, which is used in when a parsing error occurs. Default: the name of the struct or enum.
#[parsable(located=false)] // The `location` field will not be added
struct Operation {
first_operand: Operand,
other_operands: Vec<(Operator, Operand)>
}
Field attributes
prefix=<string>
: attempt to parse the specified string before parsing the field. If the prefix parsing fails, then the field parsing fails.suffix=<string>
: attempt to parse the specified string after parsing the field. If the suffix parsing fails, then the field parsing fails.brackets=<string>
: shortcut to specify both a prefix and a suffix using the first two characters of the specified string.exclude=<string>
: indicates that the parsing is only valid if the item does not match the specified regexfollowed_by=<string>
: indicates that the parsing if only valid if the item is followed by the specified regex.not_followed_by=<string>
: indicates that the parsing if only valid if the item is not followed by the specified regex.value=<string>
: on aString
field, indicates that the field only matches the specified string.regex=<string>
: on aString
field, indicates that the field only matches the regex with the specified pattern (using theregex
crate).separator=<string>
: on aVec
field, specify the separator between items.min=<integer>
: on aVec
field, specify the minimum amount of items for the parsing to be valid.cascade=false
: indicates that this field ignore the rootcascade
option
Manually implementing the Parsable
trait
Sometimes #[parsable]
is not enough and you want to implement your own parsing mechanism. This is done by implementing the parse_item
, get_item_name
and location
methods.
use parsable::{Parsable, StringReader};
struct MyInteger {
value: u32,
location: ItemLocation,
}
impl Parsable for MyInteger {
fn parse_item(reader: &mut StringReader) -> Option<Self> {
let start = reader.get_index();
match reader.read_regex(r"\d+") {
Some(string) => Some(MyInteger {
value: string.parse().unwrap(),
location: reader.get_item_location(start),
}),
None => None,
}
}
// Only used in errors
fn get_item_name() -> String {
"integer".to_string()
}
// Not required, but convenient
fn location(&self) -> &ItemLocation {
&self.location
}
}
fn main() {
let number_string = "56";
let number = MyInteger::parse(number_string.to_string(), ParseOptions::default()).unwrap();
println!("{}", number.value);
}
StringReader
wraps the string being parsed with an index that increases as the parsing goes on. It has the following methods:
content() -> &str
: returns the whole stringget_index() -> usize
: returns the current index in the stringset_index(index: usize) -> usize
: set the current index in the stringas_str() -> &str
: returns the part of the string that has not been parsed yet (same as&self.content()[self.get_index()..]
)as_char() -> char
: returns the current character (same as&self.content().as_bytes()[self.get_index()]
)is_finished() -> bool
: indicates whether the end of the string has been reachedadvance(length: usize) -> Option<&str>
: advance the current index bylength
and returns the corresponsing substring. Iflength
is0
, returnsNone
eat_spaces()
: advance the current index until a non-blank and non-comment character is reachedread_string(string: &str) -> Option<&str>
: if the string starts withstring
, advance the current index bystring
's length and returns it, otherwise returnsNone
read_regex(pattern: &'static str) -> Option<&str>
: if the string starts with the specified regex pattern, advance the current index the parsed string'length and returns it, otherwise returnsNone
peek_regex(pattern: &'static str) -> bool
: indicates if the string starts with the specified regex pattern, without advancing the current index
If parse_item
returns None
, it must ensure that the index is the same when the function exits as it was when it started.
License
MIT
Dependencies
~3.5–5MB
~93K SLoC