1 unstable release
0.1.0 | Aug 2, 2024 |
---|
#6 in #unicode-aware
27KB
730 lines
count-md
A simple, configurable command-line tool and Rust library for Unicode-aware, Markdown-aware, HTML-aware word counting in Markdown documents.
That is: this tool will correctly count words in a Unicode-aware way, without incorrectly including Markdown syntax or HTML tags. It can include or exclude content like blockquotes, footnotes, code blocks, and so on, and ships with reasonable defaults out of the box for each!
Example
You might have a file with content like this:
# Title
This is some text!
> Here is a quote from someone else.
Here is more text.
If you wanted to know the number of non-quoted words, including the title but not including the blockquote, you would simply run count-md <path to the file>
, and it will helpfully report that there are 9 words total. By contrast, wc -w
will report that there are 18 words: it includes the blockquote, of course, but it also includes the #
for the title and the >
for the blockquote, neither of which is desirable!
Status
Support for including or or excluding the following Markdown features:
- Headings
- Blockquotes
- Nested blockquotes
- Admonitions[^admonitions]
- Code blocks
- Inline code
- Block HTML 🚧 Partial
- Footnotes
- Tables
- Math
[^admonitions]: Admonitions are not blockquotes, but they are listed here because that is how they work syntactically.
Library
The core functionality here can be used as a Rust library.[^c] There are two main entry points:
-
count
: accepts a&str
and counts it with the default set of options, equivalent to runningcount-md
with zero options on the command line. -
count_with_options
: accepts a&str
and anOptions
value (a bitmask), which allows you to configure each option directly. For the equivalent to runningcount-md
with some option, useOptions::DEFAULT
and combine it with other flags:-
With bitmasking directly:
Options::DEFAULT | Options::IncludeBlockquotes
-
With the methods supplied by the
bitflags
library,insert
andremove
:let mut options = Options::DEFAULT; options.insert(Options::IncludeBlockquotes); options.remove(Options::IncludeHeadings);
-
See the documentation for more!
[^c]: In the future, I may also supply C bindings, but those need quite a bit of vetting before I am comfortable doing that!
Dependencies
~5–13MB
~155K SLoC