2 releases
0.1.1 | Feb 2, 2024 |
---|---|
0.1.0 | Jan 28, 2024 |
#1030 in Encoding
125KB
1.5K
SLoC
A crate for dealing with codepoints in TeX fonts.
lib.rs
:
This crate provides a way to access glyphs from TeX fonts. It is intended to be used by
crates using tex_engine
.
TeX deals with fonts by parsing font metric files (.tfm
files), which contain information
about the dimensions of each glyph in the font. So from the point of view of (the core of) TeX,
a glyph is just an index $0 \leq i \leq 255$ into the font metric file.
In order to find out what the glyph actually looks like, we want to ideally know the corresponding unicode codepoint. This crate attempts to do exactly that.
Usage
This crate attempts to associate a tex font (identified by the file name stem of its .tfm
file) with:
- A list of
FontModifier
s (e.g. bold, italic, sans-serif, etc.) - A
GlyphList
, being an array[
Glyph
;256]
A Glyph
then is either undefined (i.e. the glyph is not present in the font, or the crate couldn't
figure out what exactly it is) or presentable as a string.
Consider e.g. \mathbf{\mathit{\Gamma^\kappa_\ell}}
(i.e. $\mathbf{\mathit{\Gamma^\kappa_\ell}}$).
From the point of view of TeX, this is a sequence of 3 glyphs, represented as indices into the font
cmmib10
, namely 0, 20, and 96.
Here's how to use this crate to obtain the corresponding unicode characters, i.e. đ
, đż
and â
:
Instantiation
First, we instantiate a FontInfoStore
with a function that
allows it to find files. This function should take a string (e.g. cmmib10.tfm
) and return a string
(e.g. /usr/share/texmf-dist/fonts/tfm/public/cm/cmmib10.tfm
). This could be done by calling kpsewhich
for example, but repeated and frequent calls to kpsewhich
are slow, so more efficient alternatives
are recommended.
use tex_glyphs::encodings::FontInfoStore;
let mut store = FontInfoStore::new(|s| {
std::str::from_utf8(std::process::Command::new("kpsewhich")
.args(vec!(s)).output().expect("kpsewhich not found!")
.stdout.as_slice()).unwrap().trim().to_string()
});
This store will now use the provided function to find your pdftex.map
file, which lists
all the fonts that are available to TeX and associates them with .enc
, .pfa
and .pfb
files.
Obtaining Glyphs
If we now query the store for the GlyphList
of some font, e.g. cmmib10
, like so:
let ls = store.get_glyphlist("cmmib10");
...it will attempt to parse the .enc
file associated with cmmib10
, if existent. If not, or if this
fails, it will try to parse the .pfa
or .pfb
file. If neither works, it will search for a .vf
file
and try to parse that. If that too fails, it will return an empty GlyphList
.
From either of those three sources, it will then attempt to associate each byte index with a
Glyph
:
let zero = ls.get(0);
let twenty = ls.get(20);
let ninety_six = ls.get(96);
println!("0={}={}, 20={}={}, and 96={}={}",
zero.name(),zero,
twenty.name(),twenty,
ninety_six.name(),ninety_six
);
0=Gamma=Î, 20=kappa=Îș, and 96=lscript=â
Font Modifiers
So far, so good - but the glyphs are not bold or italic, but in cmmib10
, they are.
So let's check out what properties cmmib10
has:
let font_info = store.get_info("cmmib10").unwrap();
println!("{:?}",font_info.styles);
println!("{:?}",font_info.weblink);
ModifierSeq { blackboard: false, fraktur: false, script: false, bold: true, capitals: false, monospaced: false, italic: true, oblique: false, sans_serif: false }
Some(("Latin Modern Math", "https://fonts.cdnfonts.com/css/latin-modern-math"))
...so this tells us that the font is bold and italic, but not sans-serif, monospaced, etc. Also, it tells us that the publically available web-compatible quivalent of this font is called "Latin Modern Math" and that we can find it at the provided URL, if we want to use it in e.g. HTML :)
Now we only need to apply the modifiers to the glyphs:
use tex_glyphs::fontstyles::FontModifiable;
println!("{}, {}, and {}",
zero.to_string().apply(font_info.styles),
twenty.to_string().apply(font_info.styles),
ninety_six.to_string().apply(font_info.styles)
);
đ, đż, and â
The apply
-method stems
from the trait FontModifiable
, which is implemented
for any type that implements AsRef<str>
, including &str
and String
.
It also provides more direct methods, e.g. make_bold
,
make_italic
, make_sans
, etc.
Fixing Mistakes
The procedure above for determining glyphs and font modifiers is certainly not perfect; not just
because enc
and pfa
/pfb
files might contain wrong or unknown glyph names, but also because
font modifiers are determined heuristically. For that reason, we provide a way to fix mistakes:
- The map from glyphnames to unicode is stored in the file glyphmap.txt
- Font modifiers, web font names and links, or even full glyph lists can be added to the markdown file patches.md, which additionally serves as a how-to guide for patching any mistakes you might find.
Both files are parsed during compilation.
If you notice any mistakes, feel free to open a pull request for these files.
Dependencies
~1.3â2MB
~37K SLoC