#language #regex #cli-tool #phone #validation #conlang

bin+lib phonet

A CLI tool and library to validate phonotactic patterns for constructed languages

12 releases (3 stable)

1.0.2 Mar 13, 2023
1.0.0 Feb 28, 2023
0.9.6 Feb 28, 2023
0.8.1 Feb 7, 2023
0.8.0 Jan 31, 2023

#508 in Command line utilities

Download history 64/week @ 2024-02-17 313/week @ 2024-02-24

347 downloads per month

MIT license

95KB
1.5K SLoC

Phonet

Phonet is a CLI tool and library to validate phonotactic patterns for constructed languages. It is compatible with either romanization and phonetic transcription. Words can be randomly generated (see Argument Syntax).

Syntax Highlighting Extension for VSCode

Usage

This project may be used as a rust library crate, or as a binary executable.

Binary use

Download latest version here

Argument Syntax

Generated by Clap

Usage: phonet.exe [OPTIONS] [TESTS]...

Arguments:
  [TESTS]...
          Custom tests (optional)

          This overrides all tests in the file

Options:
  -f, --file <FILE>
          Name and path of file to run and test

          If name ends with a period, the 'phonet' extension is implied

          Eg. `phonet -f myfile.phonet` or `phonet -f myfile.` (same result)

          If name ends with a slash, the '/phonet' file name is implied

          Eg. `phonet -f folder/phonet` or `phonet -f folder/` (same result)

          [default: phonet]

  -q, --quiet
          Don't display passes and notes, only fails

  -m, --minify
          Minify file and save

  -w, --with-tests
          Include tests in minified file

  -g, --generate [<GENERATE>]
          Generate random words

          Default count 1, specify with number

      --gmin <GENERATE_MIN_LEN>
          Set minimum length (inclusive) for generated words

          Use with the `--generate` or `-g` flag

          Note: This increases generation time exponentially

          [default: 3]

      --gmax <GENERATE_MAX_LEN>
          Set maximum length (inclusive) for generated words

          Use with the `--generate` or `-g` flag

          [default: 20]

  -n, --no-color
          Display output in default color

          Use for piping standard output to a file

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Example

# Runs ./phonet
phonet

# Runs ./phonet, with tests: 'some', 'tests' (overrides the tests in file)
phonet some tests

# Runs ./myfile.phonet
phonet -f myfile.phonet
phonet -f myfile.phonet some tests

# 'phonet' extension implied
phonet -f myfile.

# 'phonet' filename implied
phonet -f src/phonet
phonet -f src/

# Runs ./phonet, only showing fails
phonet -q

# Runs ./phonet, and minifies to ./min.phonet without tests
phonet -m

# Runs ./myfile.phonet, only displaying fails, and minifies to ./myfile.min.phonet with tests
phonet -f myfile. -q -mw

# Runs ./phonet, and generates 1 random word
phonet -g

# Runs ./myfile.phonet, and generates 10 random words
phonet -g10 -f myfile.phonet

# Runs ./phonet, with no color, and writes output to ./phonet.txt
phonet -n > phonet.txt

# Runs ./myfile.phonet, only displaying fails, and generates 3 random words with length 6-8, writes output to ./phonet.txt (with no color)
phonet -f myfile. -qn -g 3 --gmin 6 --gmax 8 > ./phonet.txt

Create Alias / Path

Replace <path_to_file> with the directory of the downloaded binary.

Bash

Add alias in .bashrc in user directory

# ~/.bashrc
alias phonet="<path_to_file>/phonet.exe"

Powershell

Add to $env:PATH

$env:Path = "$env:Path;<path_to_file>\phonet.exe"

Library use

Add phonet = "1.0.2" to your Crates.toml file

Short Example

use phonet::Draft;

fn main() {
    let file = std::fs::read_to_string("phonet").unwrap();

    // Parse draft
    Draft::from(&file).unwrap()
        // Run tests
        .run()
        // Display results
        .display(Default::default(), true)
}

Long Example

use std::fs;

use phonet::{
    draft::{Message::Test, TestDraft},
    get_min_filename, DisplayLevel, Draft,
};

fn main() {
    let filename = "myfile.phonet";

    // Read file
    let file = fs::read_to_string(filename).expect("Could not read phonet file");

    // Parse file
    let mut draft = Draft::from(&file).expect("Failed to parse file");

    // Add a custom test
    draft.messages.push(Test(TestDraft {
        intent: true,
        word: "taso".to_string(),
    }));

    // Minify file
    fs::write(
        get_min_filename(filename),
        draft.minify(false).expect("Failed to minify"),
    )
    .expect("Could not write minified file");

    // Run tests and display only failed tests
    draft.run().display(DisplayLevel::OnlyFails, true);

    // Create a generator for random words
    // Each with a length between 5 and 8 (inclusive)
    // Generation is done lazily, similar to an iterator
    println!("Randomly generated words:");
    let mut words = draft
        .generator(5..=8)
        .expect("Failed to create word generator");

    // Generate 10 random words
    for _ in 0..10 {
        println!(" - {}", words.next());
    }
}

File syntax

A Phonet file is used to define the rules, classes, and tests for the program.

The file should either be called phonet, or end in .phonet

Syntax Highlighting Extension for VSCode

Statements

The syntax is a statements, each separated by a semicolon ; or a linebreak.

Use a Ampersand & to denote a multi-line statement. This may only be ended with a semicolon ; Note that comments cannot be multiline.

Comments will end with a linebreak or a semicolon ;

All whitespace is ignored, except to separate words in tests.

Note! This will replace spaces in Regex as well! Use \s if you need a space

Each statement must begin with an operator:

  • # Hashtag: A whole line comment. A linebreak (not a semicolon) ends the comment
  • $ Dollar: Define a class
  • + Plus or ! Bang: Define a rule
  • * Star: Create a test note, and define a reason if a test fails
  • ? Question mark: Create a test
  • ~ Tilde: Define the mode of the file

Classes

Classes are used as shorthand Regular Expressions, substituted into rules at runtime.

Note: Angle brackets will not parse as class names directly after:

  • An opening round bracket and a question mark: (?
  • An opening round bracket, question mark, and letter 'P': (?P
  • A backslash and letter 'k': \k

This is the syntax used for look-behinds and named groups

Syntax:

  • $ Dollar
  • Name - Must be only characters from [a-zA-Z0-9_]
  • = Equals
  • Value - Regular Expression, may contain other classes in angle brackets <> or ⟨⟩ (as with rules)

The 'any' class, defined with $_ = ..., is used for random word generation.

Example:

# Some consonants
$C = [ptksmn]

# Some vowels
$V = [iueoa]

# Only sibilant consonants
$C_s = [sz]

# Every letter
$_ = ⟨C⟩ | <V>

Rules

Rules are Regular Expressions used to test if a word is valid.

Rules are defined with an intent, either + for positive, or ! for negative.

  • A positive rule must be followed for a word to be valid
  • A negative rule must not be followed for a word to be valid

To use a class, use the class name, surrounded by angle brackets <> or ⟨⟩

Syntax:

  • + Plus or ! Bang - Plus for positive rule, Bang for negative rule
  • Pattern - Regular Expression, may contain classes in angle brackets <> or ⟨⟩

Example (with predefined classes):

# Must be (C)V syllable structure
+ ^ (<C>? ⟨V⟩)+ $

# Must not have two vowels in a row
! <V>{2}

Tests

Tests are checked against all rules, and the result is displayed in the output.

Tests are ran in the order of definition.

Like rules, tests must have a defined intent, either + for positive, or ! for negative.

  • A positive test will pass if it is valid
  • A negative test will fail if it is valid

Syntax:

  • ? Question mark
  • + Plus or ! Bang - Plus for positive test, Bang for negative test
  • Tests - A word, or multiple words separated by a space

Example (with predefined rules):

# This should match, to pass
?+ taso
# This test should NOT match, to pass
?! tax
# Each word is a test, all should match to pass
?+ taso sato tasa

Notes

Notes are printed to the terminal output, alongside tests.

They are used as a reason for any proceeding rules, as an explanation if a test fails.

Syntax:

  • * Star
  • : Colon (Optional) - Define a 'quiet' note
  • Text to print, and define reason as

Example:

* Syllable structure
+ ^ (<C>? <V>)+ $

# This test will NOT match, however it SHOULD (due to the Plus), so it will FAIL, with the above note as the reason
?+ tasto

# This is a 'quiet' note, it will not display, but it will be used as the reason for the following rule
*: Must not have two vowels in a row
! <V>{2}

?+ taso

Mode

The mode of a Phonet file may be one of these:

  • Romanized: Using <> (not ⟨⟩)
  • Broad transcription: Using //
  • Narrow transcription: Using []

This may optionally be specified in a file, although it does not add any functionality.

Syntax:

  • ~ Tilde
  • <.>, /./, or [.] - Mode identifier, with . being any string for, or blank

Examples:

# Specify romanized mode (fish icon)
~<>
# Specify broad transcription, with a given name
~ / My Language /

Examples

See the examples folder for Phonet file examples.

These formatting tips are not required, but recommended to make the file easier to read.

  1. Specify the mode at the very top of the file
  2. Define all classes at the top of the file
    • Also define an 'any' class first, for word generation
  3. Group related rules and tests, using a note
    • Define rules first, then positive tests, then negative tests
  4. Indent rules and tests under note
    • Rules should use 1 intent, tests use 2

Example File

Example (this is from example.phonet):

~<> ;# Mode (optional) - This file uses romanized letters

# Class definitions
$_ = ⟨C⟩ | ⟨V⟩        ;# Any / all letters (required for generating words)
$C = [ptkmnswjl]      ;# Consonants
$V = [aeiou]          ;# Vowels

* Invalid letters     ;# Note - Prints to standard output, and used as reason if test fails
  + ^_+ $          ;# Check that every letter is in the 'any' class
    ?+ taso
    ?! tyxo

* Examples of failing tests
    ?+ tyxo           ;# This test will fail - with the reason 'Invalid Letters' (above)
    ?! taso           ;# This test will fail, as a false positive

* Syllable structure
  + ^ ⟨V⟩? ( ⟨C⟩ ⟨V⟩ )+ $  ;# Check that word is Consonant + Vowel, repeating at least once
    ?+ taso kili ano atoso
    ?! taaso an

* Some more tests
    ?+ silo tila
    ?! akka axe

# This is a 'quiet' note - It will not display, unless any following rules fail
*: No repeated letters
  ! (.)\1             ;# This is an unnamed back-reference
  ! (?<x> .) \k<x>    ;# (Alternative) This is a named back-reference (NOT a class)
    ?+ taso           ;# An example of multi-line statements on next line (comments cannot be on same line)
    ?! &
      taaso
      ttaso
    ;

# Comments cannot be multiline, even using '&'

* 2 tests *should* have failed!

Phonet Icon

Dependencies

~4.5–6.5MB
~114K SLoC