#tex #unicode-characters #latex #rustex

tex-glyphs

A crate for dealing with glyphs in TeX/pdfTeX fonts as unicode characters

2 releases

0.1.1 Feb 2, 2024
0.1.0 Jan 28, 2024

#1030 in Encoding

GPL-3.0-or-later

125KB
1.5K SLoC

A crate for dealing with codepoints in TeX fonts.


lib.rs:

This crate provides a way to access glyphs from TeX fonts. It is intended to be used by crates using tex_engine.

TeX deals with fonts by parsing font metric files (.tfm files), which contain information about the dimensions of each glyph in the font. So from the point of view of (the core of) TeX, a glyph is just an index $0 \leq i \leq 255$ into the font metric file.

In order to find out what the glyph actually looks like, we want to ideally know the corresponding unicode codepoint. This crate attempts to do exactly that.

Usage

This crate attempts to associate a tex font (identified by the file name stem of its .tfm file) with:

  1. A list of FontModifiers (e.g. bold, italic, sans-serif, etc.)
  2. A GlyphList, being an array [Glyph;256]

A Glyph then is either undefined (i.e. the glyph is not present in the font, or the crate couldn't figure out what exactly it is) or presentable as a string.

Consider e.g. \mathbf{\mathit{\Gamma^\kappa_\ell}} (i.e. $\mathbf{\mathit{\Gamma^\kappa_\ell}}$). From the point of view of TeX, this is a sequence of 3 glyphs, represented as indices into the font cmmib10, namely 0, 20, and 96.

Here's how to use this crate to obtain the corresponding unicode characters, i.e. 𝜞, 𝜿 and ℓ:

Instantiation

First, we instantiate a FontInfoStore with a function that allows it to find files. This function should take a string (e.g. cmmib10.tfm) and return a string (e.g. /usr/share/texmf-dist/fonts/tfm/public/cm/cmmib10.tfm). This could be done by calling kpsewhich for example, but repeated and frequent calls to kpsewhich are slow, so more efficient alternatives are recommended.

use tex_glyphs::encodings::FontInfoStore;
let mut store = FontInfoStore::new(|s| {
std::str::from_utf8(std::process::Command::new("kpsewhich")
.args(vec!(s)).output().expect("kpsewhich not found!")
.stdout.as_slice()).unwrap().trim().to_string()
});

This store will now use the provided function to find your pdftex.map file, which lists all the fonts that are available to TeX and associates them with .enc, .pfa and .pfb files.

Obtaining Glyphs

If we now query the store for the GlyphList of some font, e.g. cmmib10, like so:

let ls = store.get_glyphlist("cmmib10");

...it will attempt to parse the .enc file associated with cmmib10, if existent. If not, or if this fails, it will try to parse the .pfa or .pfb file. If neither works, it will search for a .vf file and try to parse that. If that too fails, it will return an empty GlyphList.

From either of those three sources, it will then attempt to associate each byte index with a Glyph:

let zero = ls.get(0);
let twenty = ls.get(20);
let ninety_six = ls.get(96);
println!("0={}={}, 20={}={}, and 96={}={}",
zero.name(),zero,
twenty.name(),twenty,
ninety_six.name(),ninety_six
);
0=Gamma=Γ, 20=kappa=Îș, and 96=lscript=ℓ

Font Modifiers

So far, so good - but the glyphs are not bold or italic, but in cmmib10, they are. So let's check out what properties cmmib10 has:

let font_info = store.get_info("cmmib10").unwrap();
println!("{:?}",font_info.styles);
println!("{:?}",font_info.weblink);
ModifierSeq { blackboard: false, fraktur: false, script: false, bold: true, capitals: false, monospaced: false, italic: true, oblique: false, sans_serif: false }
Some(("Latin Modern Math", "https://fonts.cdnfonts.com/css/latin-modern-math"))

...so this tells us that the font is bold and italic, but not sans-serif, monospaced, etc. Also, it tells us that the publically available web-compatible quivalent of this font is called "Latin Modern Math" and that we can find it at the provided URL, if we want to use it in e.g. HTML :)

Now we only need to apply the modifiers to the glyphs:

use tex_glyphs::fontstyles::FontModifiable;
println!("{}, {}, and {}",
zero.to_string().apply(font_info.styles),
twenty.to_string().apply(font_info.styles),
ninety_six.to_string().apply(font_info.styles)
);
𝜞, 𝜿, and ℓ

The apply-method stems from the trait FontModifiable, which is implemented for any type that implements AsRef<str>, including &str and String. It also provides more direct methods, e.g. make_bold, make_italic, make_sans, etc.

Fixing Mistakes

The procedure above for determining glyphs and font modifiers is certainly not perfect; not just because enc and pfa/pfb files might contain wrong or unknown glyph names, but also because font modifiers are determined heuristically. For that reason, we provide a way to fix mistakes:

  1. The map from glyphnames to unicode is stored in the file glyphmap.txt
  2. Font modifiers, web font names and links, or even full glyph lists can be added to the markdown file patches.md, which additionally serves as a how-to guide for patching any mistakes you might find.

Both files are parsed during compilation.

If you notice any mistakes, feel free to open a pull request for these files.

Dependencies

~1.3–2MB
~37K SLoC