#parser #markdown-parser #pulldown-cmark #tags #extended #events #footnotes

extended_pulldown

Wrapper around pulldown-cmark with extended features for book publication

1 unstable release

0.1.0 Oct 23, 2020

#14 in #pulldown-cmark


Used in 4 crates (3 directly)

MIT/Apache

87KB
2K SLoC

An extended definition and parser of markdown, based on pulldown_cmark, which allows for some further events and typographic niceties like super- and subscript.


lib.rs:

This crate extends pulldown_cmark to do the following:

  • smarten quotes according to a more complex but substantially slower algorithm than that used in pulldown_cmark versions greater than 8.0

  • substitute unicode en-dashes, em-dashes and ellipsis for --, --- and ....

  • allow multiple-paragraph footnotes by interpreting an indented and unlabelled code block within a footnote as text to be parsed again.

  • allow several new tags:

    • Sans
    • Centred
    • Right-aligned
    • Small caps
    • Subscript
    • Superscript

It also provides a function, flatten_footnotes, which replaces footnote references and definitions with a single group of tagged text; this allows rendering to targets like LaTeX which need a footnote to be defined at the point to which it refers. It inserts empty footnotes where a definition is missing.

In general, this crate mimics the structs and methods of pulldown_cmark. However its more complex conception of markdown comes at the cost of much slower parsing. It is therefore not recommended to use instead of pulldown_cmark except where this complexity is required.

The markdown syntax to use is otherwise essentially that of CommonMark togther with pulldown_cmark's extensions.

Examples

Inline Spans

These are parsed preferentially from html spans:

use extended_pulldown::Parser;
use extended_pulldown::Event::*;
use extended_pulldown::Tag::*;

let text = concat!(r#"<span class="sans">Sans text</span>"#,
r#"<span class="centred">Centred text</span>"#,
r#"<span class="right-aligned">Right-aligned text</span>"#,
r#"<span class="smallcaps">Small caps text</span>"#,
r#"<span class="subscript">Subscript text</span>"#,
r#"<span class="superscript">Superscript text</span>"#);
   
let parsed = Parser::new(text)
    .collect::<Vec<_>>();
let expected = vec![
    Start(Paragraph),
    Start(Sans),
    Text("Sans text".into()),
    End(Sans),
    Start(Centred),
    Text("Centred text".into()),
    End(Centred),
    Start(RightAligned),
    Text("Right-aligned text".into()),
    End(RightAligned),
    Start(SmallCaps),
    Text("Small caps text".into()),
    End(SmallCaps),
    Start(Subscript),
    Text("Subscript text".into()),
    End(Subscript),
    Start(Superscript),
    Text("Superscript text".into()),
    End(Superscript),
    End(Paragraph)
];
 assert_eq!(parsed, expected);

However, markdown syntax is also extended slightly, to allow wrapping a span of alphanumeric text in ^ to indicate superscript and in ~ to indicate subscript: 25^th^ July, H~2~O.

Multipara footnotes

use extended_pulldown::Parser;
use extended_pulldown::Event::*;
use extended_pulldown::Tag::*;
use pulldown_cmark::CodeBlockKind::Indented;
let text = "Hello World[^footnote]\n\n[^footnote]:\n\tA footnote\n\n\tIn *multiple* pieces";
let output = Parser::new(text)
    .collect::<Vec<_>>();
let pulldown_output = vec![
    Start(Paragraph),
    Text("Hello World".into()),
    FootnoteReference("footnote".into()),
    End(Paragraph),
    Start(FootnoteDefinition("footnote".into())),
    Start(CodeBlock(Indented)),
    Text("A footnote\n\n".into()),
    Text("In *multiple* pieces".into()),
    End(CodeBlock(Indented)),
    End(FootnoteDefinition("footnote".into()))
];
let extended_pulldown_output = vec![
    Start(Paragraph),
    Text("Hello World".into()),
    FootnoteReference("footnote".into()),
    End(Paragraph),
    Start(FootnoteDefinition("footnote".into())),
    Start(Paragraph),
    Text("A footnote".into()),
    End(Paragraph),
    Start(Paragraph),
    Text("In ".into()),
    Start(Emphasis),
    Text("multiple".into()),
    End(Emphasis),
    Text(" pieces".into()),
    End(Paragraph),
    End(FootnoteDefinition("footnote".into()))
];
assert!(output != pulldown_output);
assert_eq!(output, extended_pulldown_output);

Flattening footnotes

use extended_pulldown::Event::*;
use extended_pulldown::Tag;

let events = vec![
  Start(Tag::Paragraph),
  Text("Hello".into()),
  FootnoteReference("1".into()),
 End(Tag::Paragraph),
  Start(Tag::FootnoteDefinition("1".into())),
  Start(Tag::Paragraph),
  Text("World".into()),
 End(Tag::Paragraph),
  End(Tag::FootnoteDefinition("1".into())),
];

let flattened = extended_pulldown::flatten_footnotes(events);
let expected = vec![
  Start(Tag::Paragraph),
  Text("Hello".into()),
  Start(Tag::FlattenedFootnote),
  Text("World".into()),
  End(Tag::FlattenedFootnote),
  End(Tag::Paragraph)
];

assert_eq!(flattened, expected);

Dependencies

~6MB
~106K SLoC