#parser #serialization #serde

nightly goff

Goff configuration language and reference serde implementation

1 unstable release

0.1.0 Jan 23, 2021

#1832 in Encoding

MPL-2.0 license

625KB
420 lines

A long time ago in a galaxy far, far away...

... JSON was the talk of the town. Its grammar fit on a business card, and everything was good.

... Everybody used JSON, for data exchange and for configuration—configuration files were small, and everything was good.

But we grew tired of using JSON. We were upset that it was unfit for a task it was not designed for And everything wasn't so good anymore.

But we did not fix JSON. Some persevered, some moved to new languages. But these languages grew deranged. We did not just fix the flaws of JSON, but we overstepped.

I want not 63 types of strings, nor turing-completeness, I want JSON with added convenience.

And Goff will make everything good.

/ɡɒθ/

A configuration langugae.

Syntax

-- This is a comment
-- It is the only type of comment.
-- Multiline comments can span multiple lines using multiple comments.

-- Goff has types:
: Network
server   = 'example.com' -- String
useProxy = no            -- Boolean
timeout  = 5             -- Integer
proxy    = Nothing       -- Nothing

: Developer
revision = 6.66 -- Real
license  = '
As long as you retain this notice you can do whatever you want with this stuff. If we meet some day,
and you think this stuff is worth it, you can buy me a beer in return.
'                                   -- String (multiline!)
workDays = [ 'Monday'
           , 'Tuesday'
           , 'Wednesday']           -- Lists
hoursWorked = ( 'Monday'    -> 8.
              , 'Tuesday'   -> 7.5
              , 'Wednesday' -> 7.5) -- Map

-- What follows is Goff's lone 'magic' feature, Functions. They are not turing complete, and more
-- play the role of templates.
-- Functions are not represented in the final deserialised data, but may be used anywhere in a Goff
-- document to reduce boilerplate.

+ smallServer                      -- The Function name
| ip, supportsIpv6, bandwidthLimit -- These are fields that must be present when invoking the
                                   -- Function
cpus = 4                           -- These fields are applied automatically
location = 'us-east-1'

+ largeServer
| ip, supportsIpv6
cpus = 8
location = 'us-east-2'

: Server Info
HTTPCache = largeServer (ip -> '100.100.100.100', supportsIpv6 -> yes)
Seedbox   = smallServer (ip -> '200.200.200.200', supportsIpv6 -> no, bandwidthLimit -> Nothing)

Internals (Conformance)

Validity

All Goff documents that do not strictly conform to the following standards are invalid. Upon encountering an invalid file, the parser implementation should cease parsing and not return any form of deserialised data. The parser implementation should take adequate meaures to report the error.

Non-Type Representation

Non-types are characterised by not being in itself present as dynamic data in the parser's completed output.

Key

A Key is composed of one or more valid, non-whitespace UTF-8 characters. Keys are used as the constant name for their assigned data. Keys in Goff are case-insensitive. If the parser is deserialising to a type with dynamic keys, like a map, keys should be normalised by lowercasing them.

Keys are always followed by zero or more spaces, an equals sign, zero or more spaces and zero or one newlines, and a value.

Struct

A Struct is represented by a line beginning with :, zero or more spaces, and one or more valid UTF-8 characters. Structs are similar to namespaces, in viewing,

: Network
server = 'example.com'

is perhaps better understood as network.server = 'example.com'.

If Keys are present without an associated Struct, they are placed into the global namespace. Given this contents of the file config.gf:

server = 'example.com'

: Network
server = 'example.com'

The following may be produced:

struct Config {
  server: String,
  network: Network,
}

struct Network {
  server: String,
}

...

Config {
  server: "example.com",
  network: Network {
    server: "example.com",
  },
}

Struct names may contain spaces, which are replaced with underscores in code. As with Keys, Struct names should also be normalised by lowercasing them.

Many languages have struct types that may be used to represent Structs, or may have similar constructs under the names like data.

Comment

A comment begins when the parser encounters two consecutive hyphens outside of a String. The parser should unconditionally ignore the rest of the line.

Escape Sequence

An escape sequence is matched inside of Strings. They always begin with a single backslash and are immediately followed by a valid UTF-8 character. If the sequence is in this list, it should be replaced in the parsed text with its equivalent value or the language's code for it.

Escape sequence Name UTF-8 codepoint
\n Newline 0x0A
\r Carriage Return 0x0D
\t Horizontal Tab 0x09
\\ Literal backslash 0x5C
\' Literal single quotation mark 0x27

Backslashes followed by a character not written here should be inserted literally into the output.

When \\ is encountered, a single literal \ should be inserted into the output text. When \' is encountered, a literal ' should be inserted into the output text, and the parser should continue to read the String.

Type Representation

Types are characterised by being valid data to assign a key's value to. This means that types are only in a valid position following a key.

String

A String is composed of zero or more valid UTF-8 characters, surrounded by one single quote on each side.

When the parser encounters a single quote in a valid position, it should continue to read until it encounters a second unescaped single quote. This means following it through newlines.

When a backslash is encountered in a String, the parser should identify the following character and check if it is a valid Goff escape sequence. If it is, the escape sequence should be represented in the output data as that language's equivalent of the escape sequence. Otherwise, it should be ignored.

The first newline of a String should be stripped.

When available, parsers should deserialise Strings to their language's string type: str, string. In languages without string types, strings are often represented by an array of characters.

Boolean

A boolean is in one of two states represented by one of two atoms:

  • yes
  • no

In languages with boolean types (true, false), Boolean is equivalent. In languages without boolean types, they should be represented by that language's idiom for representing true and false values, usually 1 and 0 respectively.

Integer

An Integer is a whole number, i.e. a number lacking a fractional segment.

The capacity of the Integer is dependant on the type used to represent that key in code. Integers with a value exceeding the capacity of its key type constitutes an invalid Goff file.

Real

A Real is a number with a fractional segument.

In definition, a Real may contain irrational numbers, though a parser may safely assume that it will never encounter a complete Goff file containing an infinite sequence of numbers.

Additionally, Reals may by mathematical definition also match whole numbers. Integer should always be prioritised over Real.

A Real must contain a period to denote its fractional segment. This means that 8.0 may be represented by either 8. or 8.0, but not 8.

The capacity of the Real is dependant on the type used to represent that key in code. Reals with a value exceeding the capacity of its key type constitutes an invalid Goff file.

List

A List is a collection of zero or more values. They are represented by an opening square bracket followed by zero or more segments, and a closing square bracket.

A segment is composed of zero or more whitespaces or newlines, a value, zero or more whitespaces or newlines, and a comma. The trailing comma may be omitted if the given segment is the last segment of the List.

All values in a List must be the same type.

This definition means all of the following are valid Lists:

foo = ['bar',]
bar = [         


                                                                                     6




              ,

]
baz = [[[[[[[,],],],]]]]

But the following is not:

bar = [, 5,] -- A comma without an associated value

Lists are often represented in programming languages by types named list, array, or vector.

Map

A Map is a collection of zero or more keys with an associated value, represented by an open parentheses and a segment.

A segment is composed of zero or more whitespaces or newlines, a key (represented by what would be an otherwise valid value), zero or more whitespaces or newlines, a hyphen symbol directly followed by a right angle bracket (->), zero or more whitespaces or newlines, a value, zero or more whitespaces or newlines, and a comma.

All keys in a list must be the same type, as must all of the values, but they may be different from each other.

A Map is often represented in programming languages by types named map, dict, or hash.

Nothing

A Nothing represents a lack of data. It is technically equivalent to not providing the key at all.

A Nothing is equivalent in many languages by types named null or nil. In languages lacking null types, they may be represented by an enum under names like Nothing or None. In languages demanding explicitly nullable types, members that are not explicitly marked as nullable but are represented by Nothing in the Goff file constitute an invalid Goff file.

Contributing

Please run cargo fmt before committing.

License

See license.txt.

Dependencies