17 releases (5 breaking)
0.8.0 | Aug 7, 2024 |
---|---|
0.7.0 | Aug 4, 2024 |
0.6.1 | Jul 29, 2024 |
0.5.0 | Jul 24, 2024 |
0.3.1 | Nov 29, 2023 |
#512 in Parser implementations
930 downloads per month
135KB
3.5K
SLoC
Munyo
Munyo
Munyo is a data language which aims to be the most efficient way to handwrite data. You can also see a clear error message along with the line number when an error occurs.
Let's see how to write data effeciently in Munyo.
This is data of competitive Pokémon battle team composition. Full Sample
|| <- This is the syntax for comments.
|| In the competitive Pokémon world, rankings are announced once a month.
>>>Season
2024 6 || The season of June 2024
>>>Team
1 || The #1 ranked team
>>>Pokemon
Koraidon Fire AssaultVest H204A+196B4C-0D12S92 FlameCharge FlareBlitz DrainPunch Uturn
FlutterMane Fairy ChoiceSpecs H148A-(0)B100C188D4S+68 MoonBlast ShadowBall DrainingKiss PerishSong | ability Protosynthesis
|| The followings are some variations of the customization of this
|| Pokémon(not necessary, just for illustration purposes)
>Item
BoostEnergy
FocusSash
>Terastal
Normal
Ground
Water
|| A team contains 6 Pokémons...
2 ||...
|| Players ranked 200 or higher tend to publish their Pokémon compositions in their blogs voluntarily.
2024 5
1
||...
In Munyo, every line must have typename:
typename arg1 arg2...
The above line is a bit inaccurate. If you want to learn the grammar correctly, read lang_spec.txt.
You can ommit the typename by setting the default typename:
|| Set the default typename 'Season'
>>>Season
2024 6
|| ↑ This line becomes 'Season 2024 6'
The following is the corresponding Rust data structure to capture the line.
#[derive(serde::Deserialize)]
enum Top {
Season(usize, usize, Vec<Second>),
}
↑ The basic usage of Munyo is to use it with 'serde'. To parse Munyo with serde, the data structure to be deserialized must be 'enum'. The enum must implement 'serde::Deserialize'. This enum implements it in the 'derive' section.
Season(usize, usize, Vec<Second>)
|| ↑typename ↑2024 ↑6 ↑ the container of the data for child lines
↑ The first argument(2024) is consumed to the first 'usize'. The second(6) is to the second. If a line have children, the last item must be Vec of the enum which captures the child lines.
Because the arguments don't have names, you need to convert them to be a decent data structure.
First, let's parse the source text of Munyo:
let r: Vec<Top> = munyo::from_str(.../* the sample text */)?;
↑ You can use 'munyo::from_str' to deserialize Munyo with serde. 'Top' implements 'serde::Deserialize', so you can use 'munyo::from_str' with type declaration 'Vec<Top>'.
let r: Vec<Season> = r.into_iter().map(top_to_season).collect();
// the full-fledged data structure
struct Season {
year: usize,
month: usize,
teams: Vec<Team>,
}
fn top_to_season(top: Top) -> Season {
match top {
Top::Season(year, month, vec) => Season {
year,
month,
teams: vec.into_iter().map(second_to_team).collect(),
},
}
}
You need 'match' with single branch to handle it.
You need indentation to write a child item.
>>>Season
2024 6
>>>Team
1 || #1 ranked team
|| ↑ Indentation means the line is a child of the one less indented line.
The character for the indentation must be TAB(ASCII code 9). You may need to change the settings for your text editor.
#[derive(Debug, serde::Deserialize)]
enum Second {
Team(usize, Vec<Third>),
}
struct Team {
rank: usize,
pokemons: Vec<Pokemon>,
}
fn second_to_team(second: Second) -> Team {
match second {
Second::Team(rank, vec) => Team {
rank,
pokemons: vec.into_iter().map(third_to_pokemon).collect(),
},
}
}
This is the third level which describes Pokemon
>>>Pokemon
Koraidon Fire AssaultVest H204A+196B4C-0D12S92 FlameCharge FlareBlitz DrainPunch Uturn
#[derive(serde::Deserialize)]
enum Third {
Pokemon(
PokeName,
PokeType,
PokeItem,
PokeValues,
PokeMove,
PokeMove,
PokeMove,
PokeMove,
Param,
Vec<Fourth>,
),
}
struct Pokemon {
name: PokeName,
poke_type: PokeType,
item: PokeItem,
custom: PokeValues,
moves: Vec<PokeMove>,
ability: Option<Ability>,
other_items: Vec<PokeItem>,
other_terastals: Vec<PokeType>,
}
#[derive(serde::Deserialize)]
enum PokeName {
Koraidon,
FlutterMane,
}
#[derive(serde::Deserialize)]
enum PokeType {
Fire,
Fairy,
Normal,
Ground,
Water,
}
//...
The line 'Pokemon' consists of PokeName, PokeType, PokeItem, and so on. These items are defined as 'enum' too.
If you write items not in the enum variants, Munyo outputs error messages like
9: unknown variant `Koraido`, expected `Koraidon` or `FlutterMane`
Koraido Fire AssaultVest H204A+196B4C-0D12S92 FlameCharge FlareBlitz DrainPunch Uturn
When error occurs, Munyo always output the line number and the line. In this case, serde also found out the cause correctly.
Pokemon customization has a traditional representation:
H204A+196B4C-0D12S92
To parse this, you need to implement the parser. Munyo can't do this for you. My recommendation is pest.
My implementation of the parser
#[derive(Parser)]
#[grammar_inline = r###"
alpha = {
"H" | "A" | "B" | "C" |"D"| "S"
}
sign = {
"+" | "-"
}
number_char = _{
'0'..'9'
}
number = {
number_char+
}
bracketed_number ={
"(" ~ number ~ ")"
}
chunk = {
alpha ~ sign? ~ (number | bracketed_number)
}
poke_custom ={
SOI ~ chunk+ ~ EOI
}
"###]
When the parser implementation returns the error message, Munyo output it with the line number and the text of the line:
10: 260 is bigger than 252
FlutterMane Fairy ChoiceSpecs H148A-(0)B100C260D4S+68 MoonBlast ShadowBall DrainingKiss PerishSong | ability Protosynthesis
252 is the max number for the Pokemon parameter customization.
To implement customized parser and to output useful error messages are both crucial for the most efficient data language.
The goal of this language is to reduce redundancy in text data to the greatest extent possible. On the other hand, the backing code is not the simplest, but as you can see, it's not very complex, I think.
Pokemons have abilities, but some Pokemons have only one ability. You don't need to write it down for them.
If you need optional parameters, you can use 'param'
typename arg1 arg2...| param_name arg | param_name2 arg2...
↑ This is the syntax of parameters in Munyo.
The data already used it.
FlutterMane Fairy... | ability Protosynthesis
The backing code is below:
#[derive(serde::Deserialize)]
enum Third {
Pokemon(
PokeName,
PokeType,
PokeItem,
PokeValues,
PokeMove,
PokeMove,
PokeMove,
PokeMove,
Param, // <- Structs are for parameters
Vec<Fourth>,
),
}
#[derive(serde::Deserialize)]
struct Param {
// field names are used as param-names
ability: Option<Ability>,
}
#[derive(serde::Deserialize)]
enum Ability {
Protosynthesis,
}
The name of the struct can be anything. It doesn't affect in Munyo. In this case, the name is 'Param'.
The struct must be 'serde::Deserialize', and the field names are used as parameter names. In this case, it's 'ability'.
FlutterMane Fairy... | ability Protosynthesis
|| ↑ the field name
Koraidon Fire...
|| ↑ No 'ability' parameter for this Pokemon
It can be Option, which means it's ommittable. 'Koraidon' doesn't have the 'ability', as you can see.
It has only one omittable parameter, which means the parameter name 'ability' can be omitted.
FlutterMane Fairy ChoiceSpecs H148A-(0)B100C188D4S+68 MoonBlast... Protosynthesis
|| ↑ Attach only the ability name at the last if the Pokemon need it.
I created the omitted versions. Version 1 is simple but it doesn't have line number in the error message because the error message is returned in the conversion process, which doesn't have the information of the line number. Version 2 implements a simple custom data structure to output the line number. When an error is returned in a parsing process, Munyo automatically attach the line number. Check them out if you'd like.
If you need more efficient syntax, you can write a custom parser which can get any number of arguments in a line and the types of the arguments can be automatically detected. See another sample if you'd like.
Pokemons basically have four moves. I implemented it naïvely.
enum Third {
Pokemon(
PokeName,
PokeType,
PokeItem,
PokeValues,
PokeMove, // <- four moves
PokeMove,
PokeMove,
PokeMove, // <-
Param,
Vec<Fourth>,
),
}
That's more robust, but If you want to make an item have multiple subitems, basically you need to employ child items(or make a custom parser).
The fourth indentation level is the example for it, although they are not needed for this Pokemon data.
FlutterMane Fairy ChoiceSpecs H148A-(0)B100C188D4S+68 MoonBlast ShadowBall DrainingKiss PerishSong | ability Protosynthesis
>Item
BoostEnergy
FocusSash
>Terastal
Normal
Ground
Water
While '>>>' defines the typename on the indentation level, '>' defines the typename at the current level.
Foo
>>>TripledType
A
>SingledType
B
>
Canceled
>SingledType2
C
StillAffected
HereIsNotCurrentLevel
>>>Triple2
D
This becomes below:
Foo
>>>TripledType
TripledType A
>SingledType
SingledType B
>
|| ↑ Single '>' with no name means canceling the definition.
Canceled
|| ↑ Canceled is the typename of this line, because there's no default typename here
>SingledType2 || ← defines a default type again
SingleType2 C
TripledType StillAffected
ThisIsNotCurrentLevel
|| ↑ Singled definitions don't affect on cousin levels.
>>>Triple2
|| ↑ Tripled definition also changable and cancellable
Triple2 D
>Item
BoostEnergy
FocusSash
>Terastal
Normal
Ground
Water
This means the Pokemon has 2 'Item's and 3 'Terastal's as its children.
The conversion is below:
fn third_to_pokemon(third: Third) -> Pokemon {
match third {
Third::Pokemon(
name,
poke_type,
item,
custom,
move1,
move2,
move3,
move4,
param,
children,
) => {
let mut other_items: Vec<PokeItem> = vec![];
let mut other_terastals: Vec<PokeType> = vec![];
for v in children {
match v {
Fourth::Item(item) => other_items.push(item),
Fourth::Terastal(t) => other_terastals.push(t),
}
}
Pokemon {
name,
poke_type,
item,
custom,
moves: vec![move1, move2, move3, move4],
ability: param.ability,
other_items,
other_terastals,
}
}
}
}
let mut vec = vec![]
is not ellegant, but powerful.
Other Materials
Since Munyo is a language, the API document isn't enough to use it. Other materials are available.
Motivation
The motivation is explained here
Async
This crate also contains the concurrent version of the functions to deserialize, and runtime agnostic async fn to receive the deserialized data concurrently.
Usage
Add these to your cargo.toml
:
[dependencies]
munyo = "0.5"
serde = { version = "1", features = ["derive"] }
License
Licensed under either of Apache License, Version 2.0 or MIT license at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
Dependencies
~6–16MB
~243K SLoC