#markup #lua #content-tree #document-generation

bin+lib litua

Read a text document, receive its tree in Lua and manipulate it before representing it as string

3 stable releases

1.1.1 Feb 10, 2023
1.1.0 Feb 4, 2023
1.0.0 Jan 27, 2023

#246 in Text processing

MIT license

3MB
1.5K SLoC

Rust 1K SLoC // 0.1% comments Lua 445 SLoC // 0.2% comments

litua

author:: tajpulo version:: 1.1.1 badges:: image:https://github.com/typho/litua/actions/workflows/release.yml/badge.svg[state of the release process] image:https://github.com/typho/litua/actions/workflows/build.yml/badge.svg[state of the build process]

Read a text document, receive its tree in Lua and manipulate it before representing it as string.

What is it about?

Text documents can be considered as trees. LISP (e.g.) makes it very explicit. Specifically, we propose the following syntax (“Litua input syntax”):

{element[attr1=value1][attribute2=val2] text content of element}

In LISP, it might be represented as …

(element :attr1 "value1" :attribute2 "val2" "text content of element")

In XML, it might be represented as …

<element attr1="value1" attribute2="val2">text content of element</element>

After providing a text document (doc.txt) in this Litua input syntax, we can invoke the compiler:

litua doc.txt

In most cases, it will write the input document as output document doc.lit. But consider these elements as tree of nested elements. Specifically, we parse it into the following Lua tables:

[source,lua]

local node = {
    -- the string giving the node type
    ["call"] = "element",
    -- the key-value pairs of arguments.
    -- values are sequences of strings or nodes
    ["args"] = { ["attr1"] = { [1] = "value1" }, ["attribute2"] = { [1] = "val2" } },
    -- the sequence of elements occuring in the body of a node.
    -- the items of content can be strings or nodes themselves
    ["content"] = {
        [1] = "text content of element"
    },
}

Then you can use the hooks …

  • Litua.modify_final_string = function (hook) [] end where hook takes a string and returns a string
  • Litua.on_setup = function (hook) [] end where hook takes no argument and returns nil
  • Litua.on_teardown = function (hook) [] end where hook takes no argument and returns nil
  • Litua.read_new_node = function (filter, hook) [] end where hook takes a copy of the current node and the tree depth as integer and returns nil
  • Litua.modify_node = function (filter, hook) [] end where hook takes the current node, the tree depth as integer, and the filter name and returns (some node or string) and nil.
  • Litua.read_modified_node = function (filter, hook) [] end where hook takes a copy of the current node and the tree depth as integer and returns nil
  • Litua.convert_node_to_string = function (filter, hook) [] end where hook takes the current node, the tree depth as integer, and the filter name and returns a string and nil

… to modify the generation process of the output document. hook is a function, you need to provide and filter is either the call name or "" to be invoked for every call. The names should give an idea of their purpose. E.g. on_setup and on_teardown run before/after all other hooks. You can supply a hook by creating a file hooks.lua next to your input doc.txt with the content:

[source,lua]

Litua.read_new_node("element", function (node, depth)
    Litua.log("hook", "found call '" .. tostring(node.call) .. "' at depth " .. tostring(depth))
end)

Be aware that the document always lives within one invisible top-level node called document. I highly recommend to go through the examples in this order to get an idea how to use the hooks:

  1. link:examples/enumeration[enumeration] – replace a call with an incrementing counter
  2. link:examples/replacements[replacements] – first define substitution pairs and then apply them
  3. link:examples/literate-programming[literate-programming] – define documentation and code block and write them to different files
  4. link:examples/markup[markup] – serialize the tree to HTML5

Why should I use it?

Because you want to handle/modify/use the tree structure of a text document without integrating sophisticated tools like XSLT. Instead the document is parsed for you and Lua (a simple and established programming language) allows you to modify the default behavior of the program.

How to install

This is a single static executable. It only depends on basic system libraries like pthread, math and libc. I expect it to work out-of-the-box on your operating system.

How to run

Call the litua executable with -h to get information about additional arguments:

litua -h

Litua input specification

The following document defines the syntax (see also design/litua-lexer-state-diagram.jpg):

[source]

Node       = (Content | RawString | Function){0,}
Content    = (NOT the symbols "{" or "}"){1,}
RawString  = "{<" (NOT the string ">}") ">}"
           | "{<<" (NOT the string ">>}") ">>}"
           | "{<<<" (NOT the string ">>>}") ">>>}"continue up to 126 "<" characters
Function   = "{" Call "}"
           | "{" Call Whitespace "}"
           | "{" Call Whitespace Node "}"
           | "{" Call ( "[" Key "=" Node "]" ){1,} Whitespace "}"
           | "{" Call ( "[" Key "=" Node "]" ){1,} Whitespace Node "}"

Call       = (NOT the symbols "[" or "<"){1,}
Key        = (NOT the symbol "="){1,}
Whitespace = any of the 25 Unicode Whitespace characters

In essence, don't use "<" or "[" in function call names, or "=" in argument keys. Keep the number of opening and closing braces balanced.

Improvements

The following parts can be improved:

  • the .parsed.expected files are not checked in the testsuite, because rust's HashMap representation is not consistent across builds.
  • verify that error handling for user code works well
  • improved error reporting for syntax errors (tokens position, etc)

Source Code

The source code is available at link:https://github.com/typho/litua[Github].

License

See link:LICENSE[the LICENSE file] (Hint: MIT license).

Changelog

0.9:: first public release with raw strings and four examples 1.0.0:: improves stdout/stderr, improved documentation, CI builds, upload to crates.io 1.1.0:: bugfix third argument of modify-node hook, modify-hook may now also return strings 1.1.1:: bugfix: interrupted '>' sequences inside raw string content can be used again, removed hook checks from testsuite

Issues

Please report any issues on the link:https://github.com/typho/litua/issues[Github issues page].

Dependencies

~3–4.5MB
~87K SLoC