7 releases
0.2.0 | Sep 23, 2023 |
---|---|
0.1.5 | Jul 26, 2021 |
0.1.2 | Jun 14, 2021 |
#741 in Text processing
23 downloads per month
23KB
365 lines
Video subtitle generation
A command line programme to generate SRT format subtitles for use in video players including YouTube.
The subtitles are generated from the full transcript text of a video. By default, each subtitle will have a maximum of 10 words.
Current status
Actively-developed.
Usage
CLI
$ subtitles --transcript INPUT-FILENAME.txt --length LENGTH-IN-SECONDS --abbr ABBREVIATION-MAP-FILE.csv > OUTPUT-FILENAME.srt
$ subtitles --transcript transcript.txt --length 300 --abbr abbreviations.csv > subtitles.srt # True example
Webassembly
import init, * as subtitles from 'path/to/subtitles.js';
(async () => {
await init();
let transcript_text = 'Lots of text goes here; newline is acceptable.';
let audio_length_in_seconds = 99;
let abbreviation_map = new Map([["UNGA", "United Nations General Assembly"], ["MDN", "Mozilla Developer Network"]]); // Optional.
const text_for_srt_file = subtitles.wasm_prepare_srt_content(transcript_text, audio_length_in_seconds, abbreviation_map);
})()
Build instruction
CLI
$ cargo build --release
Webassembly
Install wasm-pack. Then...
$ wasm-pack build --target web
$ cp pkg/{subtitles_bg.wasm,subtitles.js} /path/to/website/js/
Abbreviation map
The abbreviation map is an optional feature. This is provided as a CSV file (with no header row) from the command line or as a Javascript Map object when using the Wasm version. Its purpose is to map those character or character sequences who are pronounced very differently from how they appear. For example, the transcript may use "(UNGA)" whereas the narrator may say "United Nations General Assembly" instead. Another example is the Unicode codepoint U+FDFD which is spoken as a full sentence. In all these cases, the written form in the transcript do not hint how long it may take to say it. By providing a mapping between the written form and the spoken form, we get more accurate timing for each subtitle.
Example CSV
(UNGA),(United Nations General Assembly)
(MDN),(Mozilla Developer Network)
﷽,Bi-smi llāhi r-raḥmāni r-raḥīm
Licence
Dependencies
~4–5MB
~76K SLoC