3 stable releases
1.1.1 | Feb 10, 2023 |
---|---|
1.1.0 | Feb 4, 2023 |
1.0.0 | Jan 27, 2023 |
#246 in Text processing
3MB
1.5K
SLoC
litua
author:: tajpulo version:: 1.1.1 badges:: image:https://github.com/typho/litua/actions/workflows/release.yml/badge.svg[state of the release process] image:https://github.com/typho/litua/actions/workflows/build.yml/badge.svg[state of the build process]
Read a text document, receive its tree in Lua and manipulate it before representing it as string.
What is it about?
Text documents can be considered as trees. LISP (e.g.) makes it very explicit. Specifically, we propose the following syntax (“Litua input syntax”):
{element[attr1=value1][attribute2=val2] text content of element}
In LISP, it might be represented as …
(element :attr1 "value1" :attribute2 "val2" "text content of element")
In XML, it might be represented as …
<element attr1="value1" attribute2="val2">text content of element</element>
After providing a text document (doc.txt
) in this Litua input syntax, we can invoke the compiler:
litua doc.txt
In most cases, it will write the input document as output document doc.lit
.
But consider these elements as tree of nested elements. Specifically, we parse it into the following Lua tables:
[source,lua]
local node = {
-- the string giving the node type
["call"] = "element",
-- the key-value pairs of arguments.
-- values are sequences of strings or nodes
["args"] = { ["attr1"] = { [1] = "value1" }, ["attribute2"] = { [1] = "val2" } },
-- the sequence of elements occuring in the body of a node.
-- the items of content can be strings or nodes themselves
["content"] = {
[1] = "text content of element"
},
}
Then you can use the hooks …
Litua.modify_final_string = function (hook) […] end
wherehook
takes a string and returns a stringLitua.on_setup = function (hook) […] end
wherehook
takes no argument and returns nilLitua.on_teardown = function (hook) […] end
wherehook
takes no argument and returns nilLitua.read_new_node = function (filter, hook) […] end
wherehook
takes a copy of the current node and the tree depth as integer and returns nilLitua.modify_node = function (filter, hook) […] end
wherehook
takes the current node, the tree depth as integer, and the filter name and returns (some node or string) and nil.Litua.read_modified_node = function (filter, hook) […] end
wherehook
takes a copy of the current node and the tree depth as integer and returns nilLitua.convert_node_to_string = function (filter, hook) […] end
wherehook
takes the current node, the tree depth as integer, and the filter name and returns a string and nil
… to modify the generation process of the output document. hook
is a function, you need to provide and filter
is either the call name or ""
to be invoked for every call. The names should give an idea of their purpose. E.g. on_setup
and on_teardown
run before/after all other hooks. You can supply a hook by creating a file hooks.lua
next to your input doc.txt
with the content:
[source,lua]
Litua.read_new_node("element", function (node, depth)
Litua.log("hook", "found call '" .. tostring(node.call) .. "' at depth " .. tostring(depth))
end)
Be aware that the document always lives within one invisible top-level node called document
.
I highly recommend to go through the examples in this order to get an idea how to use the hooks:
- link:examples/enumeration[enumeration] – replace a call with an incrementing counter
- link:examples/replacements[replacements] – first define substitution pairs and then apply them
- link:examples/literate-programming[literate-programming] – define documentation and code block and write them to different files
- link:examples/markup[markup] – serialize the tree to HTML5
Why should I use it?
Because you want to handle/modify/use the tree structure of a text document without integrating sophisticated tools like XSLT. Instead the document is parsed for you and Lua (a simple and established programming language) allows you to modify the default behavior of the program.
How to install
This is a single static executable. It only depends on basic system libraries like pthread, math and libc. I expect it to work out-of-the-box on your operating system.
How to run
Call the litua executable with -h
to get information about additional arguments:
litua -h
Litua input specification
The following document defines the syntax (see also design/litua-lexer-state-diagram.jpg
):
[source]
Node = (Content | RawString | Function){0,…}
Content = (NOT the symbols "{" or "}"){1,…}
RawString = "{<" (NOT the string ">}") ">}"
| "{<<" (NOT the string ">>}") ">>}"
| "{<<<" (NOT the string ">>>}") ">>>}"
… continue up to 126 "<" characters
Function = "{" Call "}"
| "{" Call Whitespace "}"
| "{" Call Whitespace Node "}"
| "{" Call ( "[" Key "=" Node "]" ){1,…} Whitespace "}"
| "{" Call ( "[" Key "=" Node "]" ){1,…} Whitespace Node "}"
Call = (NOT the symbols "[" or "<"){1,…}
Key = (NOT the symbol "="){1,…}
Whitespace = any of the 25 Unicode Whitespace characters
In essence, don't use "<" or "[" in function call names, or "=" in argument keys. Keep the number of opening and closing braces balanced.
Improvements
The following parts can be improved:
- the
.parsed.expected
files are not checked in the testsuite, because rust's HashMap representation is not consistent across builds. - verify that error handling for user code works well
- improved error reporting for syntax errors (tokens position, etc)
Source Code
The source code is available at link:https://github.com/typho/litua[Github].
License
See link:LICENSE[the LICENSE file] (Hint: MIT license).
Changelog
0.9:: first public release with raw strings and four examples 1.0.0:: improves stdout/stderr, improved documentation, CI builds, upload to crates.io 1.1.0:: bugfix third argument of modify-node hook, modify-hook may now also return strings 1.1.1:: bugfix: interrupted '>' sequences inside raw string content can be used again, removed hook checks from testsuite
Issues
Please report any issues on the link:https://github.com/typho/litua/issues[Github issues page].
Dependencies
~3–4.5MB
~87K SLoC