#awk #dsl #packrat #parser #vm

bin+lib tokay

Tokay is a programming language designed for ad-hoc parsing, inspired by awk

4 releases (2 breaking)

Uses new Rust 2021

0.5.1 May 17, 2022
0.5.0 May 17, 2022
0.4.0 Nov 15, 2021
0.3.0 Jul 7, 2021

#7 in Programming languages

Download history 26/week @ 2022-03-16 22/week @ 2022-03-23 9/week @ 2022-03-30 15/week @ 2022-04-06 3/week @ 2022-04-13 9/week @ 2022-04-20 36/week @ 2022-04-27 34/week @ 2022-05-04 111/week @ 2022-05-11 67/week @ 2022-05-18 70/week @ 2022-05-25 38/week @ 2022-06-01 15/week @ 2022-06-08 40/week @ 2022-06-15 18/week @ 2022-06-22 21/week @ 2022-06-29

97 downloads per month
Used in 2 crates

MIT license

695KB
17K SLoC

Tokay

Tokay Logo Build status docs.rs crates.io tokay.dev License: MIT

Tokay is a programming language designed for ad-hoc parsing.

Tokay is under development and not considered for production use yet; There are plenty of bugs, incomplete features and planned concepts. Please help to improve it!

About

Tokay is a language to quickly implement solutions for text processing problems. This can either be just simple data extractions, but also parsing entire structures or parts of it, and turning information into structured parse trees or abstract syntax trees for further processing.

Therefore, Tokay is both a tool for simple one-liners, but can also be used to implement code-analyzers, refactoring tools, interpreters, compilers or transpilers. Actually Tokay's own language parser is implemented in Tokay itself.

Tokay is inspired by awk, but follows its own philosophy, ideas and design principles. It might be usable as a common scripting language for various problems as well, but mainly focuses on the parsing features, which are a fundamental part built into the language.

Tokay is still a very young project and gains much potential. Volunteers are welcome!

Highlights

  • Interpreted, procedural and imperative scripting language
  • Concise and easy to learn syntax and object system
  • Stream-based input processing
  • Automatic parse tree construction and synthesis
  • Left-recursive parsing structures ("parselets") supported
  • Implements a memoizing packrat parsing algorithm internally
  • Robust and fast, as it is written entirely in safe Rust
  • Enabling awk-style one-liners in combination with other tools
  • Generic functions and parselets (*coming soon)
  • Import system to create modularized programs (*coming soon)
  • Embedded interoperability with other programs (*coming soon)

Examples

Tokay's version of "Hello World" is quite obvious:

print("Hello World")

Tokay can also greet any wor(l)ds that are being fed to it. The next program prints "Hello Venus", "Hello Earth" or "Hello" followed by any other name previously parsed by the builtin Word-token. Any other input than a word is automatically omitted.

world => Word   print("Hello " + $world)

A simple program for counting words which exists of a least three characters and printing a total can be implemented like this:

Word(min=3) ++words accept
end words

The next, extended version of the program from above counts all words and even numbers.

Word ++words accept
{ Float ; Int } ++numbers accept
end words numbers

By design, Tokay constructs syntax trees from consumed information automatically.

The next program directly implements a parser and interpreter for simple mathematical expressions, like 1 + 2 + 3 or 7 * (8 + 2) / 5. The result of each expression is printed afterwards. Processing direct and indirect left-recursions without ending in infinite loops is one of Tokay's core features.

_ : [ \t]+                # redefine whitespace to just tab and space

Factor : @{
    Int _                 # built-in 64-bit signed integer token
    '(' _ Expr ')' _
}

Term : @{
    Term '*' _ Factor     $1 * $4
    Term '/' _ Factor     $1 / $4
    Factor
}

Expr : @{
    Expr '+' _ Term       $1 + $4
    Expr '-' _ Term       $1 - $4
    Term
}

Expr _ print("= " + $1)   # gives some neat result output

An example run of this program as provided is this:

$ tokay calc.tok
1 + 2 + 3
= 6
7 * (8 + 2) / 5
= 14

Tokay can also be used for programs without any parsing features.
Next is a recursive attempt for calculating the faculty of an integer.

faculty : @x {
    if !x return 1
    x * faculty(x - 1)
}

faculty(4)

Documentation

Same as Tokay itself, the documentation is currently established. The latest version can be obtained on the website tokay.dev. The documentation source code is maintained in a separate repository.

Repository

This repository holds all required source files to provide Tokay with examples.

.                  # Build scripts, Cargo.toml, etc.
├── assets         # Asset files (logo)
├── examples       # Example programs
├── macros         # Crate to provide compile-time macros
├── src            # Tokay source, includes primary modules
│   ├── compiler   # Compiler
│   ├── value      # Values, objects and built-ins
│   └── vm         # Virtual stack machine
└── tests          # Contains some examples used by test suite

Contribute

Contributions of any kind, might it be code, bug reports, bugfixes, documentation, support or advertising are always welcome!

Take a look into the bug tracker or watch for //fixme- and //todo-comments in the source code for open issues and things that need to be improved (there are plenty of them).

If you want to create a pull request, ensure that cargo run and cargo test run without errors. When new features where added, don't miss to write some unit tests for them. Run cargo fmt before you finally commit.

Feel free to contact me directly on any questions, or file an issue here.

Logo

The Tokay programming language is named after the Tokay gecko (Gekko gecko) from Asia, shouting out "token" in the night.

The Tokay logo and icon was thankfully designed by Timmytiefkuehl.
Check out the tokay-artwork repository for different versions of the logo as well.

License

Copyright © 2022 by Jan Max Meyer, Phorward Software Technologies.

Tokay is free software under the MIT license.
Please see the LICENSE file for details.

Dependencies

~3–4MB
~83K SLoC