1 unstable release
0.0.1 | Aug 12, 2024 |
---|
#129 in #encode-decode
20KB
281 lines
ubase64
Lightweight, simple, and configurable base64 encode and decode.
⚠️ WARNING: this is alpha-stage software and has not been extensively tested. Use at your own risk.
Contributions (tests and benchmarks especially) are very welcome!
Overview
Another one?
Yes, indeed. There's a wide array of base64 Rust implementations out there, some likely work very well, but none of them are perfect. Neither is this I'm sure, as much as I tried to address the issues I saw there's always room for improvement (PRs welcome! 😜), but it's done quite nicely for my needs so far.
Design goals
I had four main goals in designing ubase64
:
- Intuitive: The base64 specification and usages in the wild include a good deal more variation than one might expect. Different alphabet options, whether to use padding, line wrapping options, etc. Thing is, most users neither know nor care about these options, and in my opinion forcing them to is bad design. They want the default, without arcane setups and invocations; and odds are the default will work just fine.
- Configurable: Given the wide array of options as mentioned in the first point, the
second goal of
ubase64
is to provide enough options to support as many potential base64 use cases as possible while keeping the "basic" API clean and simple. - Performance:
ubase64
tries to strike a balance between the best possible performance and the smallest possible binary size. - Versatility: Finally,
ubase64
should work in a wide variety of contexts. In addition to library usage, it provides a command-line interface so you can benefit from fast encoding and decoding in the terminal too!
Usage
You can use ubase64
in your project as a library, or install it as a command-line binary.
Library
Add ubase64
to your project's dependencies:
[dependencies]
ubase64 = { version = "0.0.1", features = [] }
Please note: the package's default features include dependencies only needed for a binary
install. If you don't need these, ensure you have an empty dependency array as shown above;
you can install with cargo add --no-default-features ubase64
to do this automatically.
Example library usage:
use ubase64::encode_std;
let encoded = encode_std(b"Hello, world!");
assert_eq!(encoded, "SGVsbG8sIHdvcmxkIQ==");
Binary
Install ubase64
as a binary with Cargo:
cargo install ubase64
Example binary usage:
$ ubase64 encode "Hello, world!"
SGVsbG8sIHdvcmxkIQ==
$ echo -n "Hello, world!" > test.txt
$ ubase64 encode -f test.txt
SGVsbG8sIHdvcmxkIQ==
Architecture
References
- RFC 4648: Base64 data encoding standard. Describese the most common implementations of base64.
- feross/base64-js: Base64 in pure JavaScript. Fast and clean; this was highly influential in the design.
- uhmarcel/rbase64: A solid Rust implementation with a relatively nice API, but not actively maintained.
Considerations
- Minimal dependencies: To keep builds small and performance predictable,
ubase64
installed as a library includes only two dependencies (thiserror
andmiette
, both for error handling). - Clean API: The standard wrappers
encode_std
anddecode_std
are provided to make the most common use case as painless as possible, and the more complex configurations should still be relatively intuitive. - Easy CLI: Some popular implementations include no CLI, and many try to match the
interface of the classic GNU
base64
utility. I found that interface confusing (why does it encode by default and require a flag to decode?) and it's always a process for me to figure out how to give it a string input, so I took a more opinionated approach.
Implementation
Alphabets
Two alphabet options are provided, standard (STD_ALPHABET
) and URL-safe (URL_ALPHABET
).
These are represented as 64-byte const arrays.
For decoding, the base64_reverse_alphabet
function is used to create a reverse lookup table
from the alphabet array at compile time. This is implemented as a 256-byte const array which
maps the byte value of each character in the alphabet to its index in the array. Only 64 of
the 256 values will ever be filled, but this is an insignificant amount of memory and should
incur less overhead than an equivalent map type.
Data types
Decoded values are arbitrary data and thus are represented as &[u8]
(as input) or Vec<u8>
(as output).
Encoded values are strings represented as &str
(input) or String
(output).
Dependencies
~1.7–7.5MB
~54K SLoC