3 unstable releases
0.1.1 | Oct 29, 2024 |
---|---|
0.1.0 | May 30, 2024 |
0.0.0 | Sep 1, 2023 |
#203 in Text processing
1MB
26K
SLoC
vidyut-prakriya
A Paninian word generator
(Published as [A fast prakriyā generator][paper] at ISCLS 2024.) [paper]: https://iscls.github.io/assets/files/proceedings/2024.iscls.7.pdf
vidyut-prakriya
generates Sanskrit words with their prakriyās (derivations)
according to the rules of Paninian grammar and currently implements around
2,000 rules. Our long-term goal is to provide a complete implementation of the
Ashtadhyayi.
This crate is under active development as part of the Ambuda project. If you enjoy our work and wish to contribute to it, please see the Contributing section below. We also encourage you to join our Discord server, where you can meet other Sanskrit programmers and enthusiasts.
An online demo is available here.
Overview
vidyut-prakriya
has three distinguishing qualities:
-
Fidelity. We follow the rules of Paninian grammar as closely as possible. Each word we return can optionally include a prakriyā that lists each rule that was used as well as its result.
-
Speed. On my laptop (a 2.4GHz 8-core CPU with 64 GB of DDR4 RAM), this crate generates almost 50,000 words per second on a single thread. All else equal, a fast program is easier to run and test, which means that we can produce a larger word list at a higher standard of quality.
-
Portability. This crate compiles to fast native code and can be bound to most other progamming languages with a bit of effort. In particular, this crate can be compiled to WebAssembly, which means that it can run in a modern web browser.
vidyut-prakriya
has excellent support for Sanskrit's basic word types,
including subantas, tiṅantas, kṛdantas, and taddhitāntas. It has
moderate support for samāsas and weak support for accent rules.
Usage
To generate all basic tinantas in kartari prayoga, run:
$ make create_tinantas > output.csv
The first run of make create_tinantas
will be slow since your machine must
first compile vidyut-prakriya
. After this initial compilation step, however,
subsequent runs will be much faster, and make create_tinantas
will likely
compile and complete within a few seconds.
To generate prakriyas programmatically, you can use the starter code below:
use vidyut_prakriya::Vyakarana;
use vidyut_prakriya::args::*;
let v = Vyakarana::new();
let dhatu = Dhatu::mula("BU", Gana::Bhvadi);
let args = Tinanta::builder()
.dhatu(dhatu)
.lakara(Lakara::Lat)
.prayoga(Prayoga::Kartari)
.purusha(Purusha::Prathama)
.vacana(Vacana::Eka)
.build().unwrap();
let prakriyas = v.derive_tinantas(&args);
for p in prakriyas {
println!("{}", p.text());
println!("---------------------------");
for step in p.history() {
let terms: Vec<_> = step.result().iter().map(|x| x.text()).filter(|x| !x.is_empty()).collect();
let result = terms.join(" + ");
println!("{:<10} | {}", step.rule().code(), result);
}
println!("---------------------------");
println!("\n");
}
Output of the code above:
Bavati
---------------------------
1.3.1 | BU
3.2.123 | BU + la~w
1.3.2 | BU + la~w
1.3.3 | BU + la~w
1.3.9 | BU + l
1.3.78 | BU + l
3.4.78 | BU + tip
1.3.3 | BU + tip
1.3.9 | BU + ti
3.4.113 | BU + ti
3.1.68 | BU + Sap + ti
1.3.3 | BU + Sap + ti
1.3.8 | BU + Sap + ti
1.3.9 | BU + a + ti
3.4.113 | BU + a + ti
7.3.84 | Bo + a + ti
1.4.14 | Bo + a + ti
6.1.78 | Bav + a + ti
8.4.68 | Bav + a + ti
---------------------------
The left column shows a simple string label for each rule that was applied
during the derivation, and you can find details about what values these labels
can take in the comments on the Rule
type. We suggest using ashtadhyayi.com
to learn more about these rules.
The right column shows the in-progress prakriya. We use an output convention that is common on other Ashtadhyayi websites. The encoding format for this text is SLP1, which is the encoding format we use throughout the crate.
For more details, see the following methods on the Vyakarana
struct:
derive_tinantas
(for verbs)derive_subantas
(for nominals)derive_krdantas
(for verbal suffixes)derive_taddhitantas
(for nominal suffixes)
Contributing
vidyut-prakriya
is an ambitious project, and your contributions can help it
grow.
Reporting errors
The easiest way to help is to file a GitHub issue if you notice an error. Please let us know what form you expected to see. We would also greatly appreciate relevant citations from the grammatical literature so that we can better understand and resolve the issue.
Modifying the code
First, see if you can run our existing code on your machine. We suggest that you start by running our integration tests:
$ make create_test_files
$ make test_all
Next, try using our prakriya debugger, which shows exactly how a given word was derived:
$ make debugger
Once you've confirmed that your setup works, we suggest that you read through
the documentation for Term
(in the term
module) and Prakriya
(in the
prakriya
module). Almost every part of the code touches these two structs.
To get familiar with our rules, we suggest that you skim through the
ashtadhyayi
module, which defines our high-level API and wraps all of the
rules that we use in the system. We encourage you to read our extensive
comments and explore the smaller modules that we use within ashtadhyayi
.
Now you're ready to make changes to the code. After you make your changes, run
make test_all
to verify the impact of your code.
If you are satisfied with your changes, you will need to update our integration test file. This process has three steps. First, run the steps below and confirm that your tests fail:
$ make create_test_files
$ make test_all
make test_all
should fail on a hash comparison error. Copy the new hash code,
replace the existing hash code in the test_all
in our Makefile
with that
copied value. Then, run make test_all
again and confirm that all tests pass.
Data
This crate includes a Dhatupatha sourced from ashtadhyayi.com, and the author of ashtadhyayi.com has graciously agreed to share this file with us under an MIT license.
For details on the lineage of this Dhatupatha, see our separate data README.
Design
See ARCHITECTURE.md
for details.
Dependencies
~7.5MB
~118K SLoC