3 stable releases
new 1.2.0 | Oct 31, 2024 |
---|---|
1.1.0 | Oct 21, 2024 |
1.0.0 | Oct 19, 2024 |
#108 in Configuration
347 downloads per month
Used in serde_conl
34KB
616 lines
CONL is a post-minimalist, human-centric configuration language.
It is a replacement for JSON/YAML/TOML, etc... that supports a JSON-like data model of values, maps and lists; but is designed to be much easier to work with.
For example:
# There are four ways to define a value:
scalar = value
list
= value1
= value2
map
key1 = value1
key2 = value2
multiline_scalar = """
value
# For multiline scalars, you can specify a tag for syntax highlighting.
init_script = """bash
#!/bin/bash
echo "hello world"
# There is no quoting. Leading and trailing whitespace is ignored.
# but keys and values can contain any characters (*conditions apply)
spaced out key = value with = signs
# To make it safe to include URLs as values, # is only a comment
# at the start of a line, after whitespace, or after the first =
# sign on a line.
a = https://example.com#a # jump to section a
# But the space around the = sign is purely for readability
short=16 bits
# It is possible to nest lists and maps as needed.
# (and as in JSON, types can be mixed however you want)
json_like
sub_map
key = value
sub_list
= value
=
map = no problem
=
= a list in a list # in a map in a map
sub_value = 5
# For things that cannot be otherwise represented, you can use escapes:
escapes
= "" # '"'
= "= # '=' (needed only if you want an equals in your key)
= "# # '#' (needed if you want a literal # at the start of a key/value, or after whitespace)
= "_ # normal space (needed only for leading/trailing whitespace)
= "> # tab (recommended always, but only needed for leading/trailing whitespace)
= "/ # newline
= "\ # carriage return
= "{1F321} # 🐱 (you can refer to any Unicode codepoint that is valid in UTF-8)
= "{} # gives an empty string (and must stand alone)
# Variable types are not syntactically distinct.
# The app you are configuring already knows what to expect.
enabled = yes
country_code = no
# CONL also has no null, so you should comment out values you don't wish to set.
# (or use "{} as a placeholder)
overrides
# bits_per_byte = 8
int_size = 32
Syntax
The syntax of CONL has been designed with several priorities (in order):
- To be easy to read
- To be easy to edit
- To be easy to parse
The source must be valid UTF-8, and because CONL is indentation sensitive this grammar assumes the synthetic indent
and outdent
tokens are generated as described below.
In keeping with tradition, a newline may be specified with either a newline (U+000A) or carriage return (U+000D), or both:
newline = '\r' | '\n' | '\r\n'
Within a line, you can use tabs (U+0009) or spaces (U+0020) for blanks. Other unicode spaces (e.g. U+200B) are treated as any other character (so that parsing is not dependent on Unicode version or multibyte characters).
blank = ' ' | '\t'
A comment begins with the pound sign (U+0023), and continues until the next newline. To allow for keys or values that contain a literal pound sign, comments that do not start at the beginning of a line or after an = must be preceded by a blank.
comment = '#' (^ '\r' | 'n')*
An escape sequence begins with a double quote (U+0022) and is followed by either a named escape, or a hexadecimal sequence.
""
,"#
,"=
generate"
,#
and=
respectively."_
,">
,"\
and"/
generate space, tab, carriage return and newline."{ [0-9a-fA-F]+ }
generates the unicode character with the specified hexadecimal value. Unpaired surrogates are disallowed to ensure that all values are valid UTF-8.
escape = '"' | '#' | '=' | '_' | '>' | '\' | '/' | ( '{' [0-9a-fA-F]+ '}' )
To represent the empty string, you can use "{}
.
empty = `"{}`
A key in CONL always starts and ends with a non-blank, non-newline character. Within a key blanks are preserved. The character # may be included in a key if it is escaped, or not preceded by blanks. The character = may be included in a key if it is escaped.
key_char = (^ ' ' | '\t' | '\r' | '\n' | '"' | '#' | '=') | ('"' escape)
key = empty | ( key_char (key_char | '#' | blank+ key_char)* )
Values are the same as keys, but = characters are also allowed.
value_char = (^ ' ' | '\t' | '\r' | '\n' | '"' | '#') | ('"' escape)
value = empty | ( value_char (value_char | '#' | blank+ value_char)* )
For longer values, or values that contain newlines, you can use multline syntax. To allow for better syntax highlighting in modern editors, multiline tokens can be tagged with the expected language. Language tags cannot start with an escape sequence to avoid ambiguity, and also may not contain quotes or space to help avoid accidental errors.
After parsing, multline tokens have all initial and final blanks and newlines removed. All newlines become \n, and any trailing or leading whitespace on individual lines is preserved. This means they cannot represent values that start or end with blanks or whitespace, or values containing carriage returns.
multline_tag = (^ '"' | '#' | '=' | '_' | '>' | '\' | '/' | '{') (^ '"' | ' ' | '\t')*
multiline_value = '"""' multiline_tag? blank* comment? newline indent .* outdent
Maps and lists are represented as indent-separated sections in the file. A section that contains no items (and for which the parser has no type hints) is considered an empty map. Keys must be unique within a map section.
section = list_section | map_section
map_section = (map_item | comment? newline)*
list_section = (comment? newline)* (list_item | comment? newline)+
Within a section any list item or map key can be set to either a single value, a multiline value, a map or a list. An = sign is allowed (but discouraged) after a map key before a nested section.
list_item: '=' blank* any_value
map_item: key blank* blank '=' any_value
| key blank* (blank comment)? newline indent section outdent
any_value: value blank* (blank comment)? newline
| multiline_value
| comment? newline indent section outdent
Indents
The level
of a line is the string of tab and space characters at the start. Lines that contain no non-blank characters are assumed to have the same indentation as the previous line, though lines that contain just a comment must have the correct indentation.
Any mix of tabs and spaces is allowed in the level
and they are considered distinct. Within a multiline string indent/outdent tokens are not generated, so that multiline values can contain inconsistent indentation.
After a newline, there are four possibilities:
- The level of this line matches the previous one. No tokens are generated.
- The level of this line starts with the level of the previous line, and it is longer. In that case an
indent
token is generated. - The level of this line is shorter than the previous one and matches an earlier line. In this case one
outdent
token is generated perindent
token generated since that line. - The level of this line does not match an earlier line. This is an error.
Other considerations
CONL cannot explicitly represent a null
value (to avoid the unnecessary distinction between a key mapped to null and a missing key). For maps you should omit keys that have the default value, and for list items (or map keys) you can use the empty string "{}
.
This means that you cannot distinguish between a vec![None]
and a vec![Some("")]
in a map or a list. (Though hopefully such an subtle distinction doesn't make an impact on your application's behaviour)
CONL can represent maps with any key type (not just strings) by parsing the keys as you would values.
Most values can be serialized as either a single-line or a multi-line string. The exceptions are those that start or end ' ', '\t' or '\n', or contain '\r'. Parsers should not distinguish between single-line or multi-line syntax (the indicator is purely for syntax highlighting). Serializers should chose the most convenient (typically if the string contains newlines and can be represented as such, a multiline string is better).
Why?
Why not? I was inspired to create CONL by this excellent INI critique of TOML. It reminded me that my struggles to write TOML or YAML by hand were not due to failings on my part, but due to the inherent complexity of a "minimal" format that has four syntaxes for strings, and eleven other data-types to contend with.
In my day-to-day life I spend a non-trivial amount of time editing configuration files that are either giant chunks of YAML (Github workflows, Kubernetes manifests...), giant chunks of JSON-with-comments files (Zed's configuration files), or TOML (Rust cargo files). What if there were one format that married could do it all? By removing all that is unnecessary only the useful remains.