3 stable releases

new 1.0.2 Apr 28, 2024
1.0.1 Apr 27, 2024

#205 in Text processing

Download history

86 downloads per month

MIT license

1.5MB
1.5K SLoC

TypeScript 1K SLoC // 0.1% comments Svelte 215 SLoC // 0.0% comments Rust 112 SLoC // 0.1% comments JavaScript 24 SLoC // 0.2% comments

Santoka

arukeba kimpouge suwareba kimpouge
if i walk the buttercups if i sit the buttercups

Translations of 668 of Taneda Santōka's free-verse haiku, including excellent translations by Hiroaki Satō, Scott Watson, and Cid Corman.

Available as JSON for easy parsing, or in the Leaflet.md format for viewing in Markdown-friendly apps like Obsidian.

You can explore the poems in the dataset at lucaaurelia.com/santoka.

Source

Many of these poems were originally compiled and digitized by Gábor Terebess for Terebess Asia Online. I've added metadata like publication URLs and converted to a structured format for easy parsing.

Poems

See ./poems.json. Poems are in this format:

{
  "id": 68,
  "publicationId": 3,
  "englishText": "Absolutely no cloud I take off my hat",
  "japaneseText": "Mattaku kumo ga nai kasa o nugi"
}

The japaneseText field is missing for some poems, and can contain either romaji or kana/kanji.

Publications

See ./publications.json. Publications are in this format:

{
  "id": 12,
  "name": "Santôka",
  "translatorIds": [11, 16],
  "year": 2006,
  "description": "Santôka: A Translation with Photographic Images. Photographs by Hakudô Inoue; book and cover design by Kazuya Takaoka; English text by Emiko Miyashita and Paul Watsky. (PIE Books, Tokyo, 2006). 400 pages",
  "url": "https://thehaikufoundation.org/omeka/items/show/2643",
  "lucaRanking": 5
}
Field Description
name This is my best attempt to identify a single name for the publication, but it's occasionally a judgment call since the data includes personal web pages and other informal sources.
translatorIds This is an array since sometimes publications have multiple people working on the English text, like in the example above. Publications with one translator (most of them) use a one-element array: translatorIds: [8].
year This is either a number, or null if I couldn't determine a publication year.
description This is a free-form text description of the publication.
url A somewhat authoritative URL for the publication, when I could find it. null if I couldn't.
lucaRanking This is a subjective ranking based on where I want the publication to show up on lucaaurelia.com/santoka.

Translators

See ./translators.json. Translators are in this format:

{
  "id": 10,
  "name": "Scott Watson"
}

Installation

If you're using Rust, you can install this dataset from crates.io.

cargo add santoka

Example usage

Parsing JSON is straightforward in most languages. Here's a JavaScript example:

import fs from "fs";

const poemsJson = fs.readFileSync("./santoka/poems.json");
const poems = JSON.parse(poemsJson);
console.log(poems);

If you're using Rust, the santoka crate takes care of parsing for you:

fn main() {
    let dataset = santoka::Dataset::new();

    for poem in &dataset.poems {
        dbg!(&poem);

        let publication = dataset.publication(poem.publication_id);
        dbg!(&publication);

        let translators = dataset.translators(publication.translator_ids);
        dbg!(&translators);
    }
}

Dependencies