6 releases
new 0.2.3 | Apr 29, 2025 |
---|---|
0.2.2 | Apr 29, 2025 |
0.1.1 | Apr 15, 2025 |
#267 in Filesystem
486 downloads per month
1MB
40K
SLoC
tika-magic
tika-magic is a Rust library that determines the MIME type of a file or byte array. tika-magic is meant to be an API compatible with the fantastic tree_magic_mini crate, but without a dependency on the system magic file database (which is GPL).
tika-magic uses the Apache Tika mimetypes library to provide an Apache 2.0 licensed MIME detection library.
About tika-magic
tika-magic
was created due to system differences in the system magic database causing inconsistency in down-stream
software. Unfortunately, the libmagic
magic database is licensed GPL which prevents many developers from being able
to use it or distribute software using it. It's not a great UX to require your users to keep their magic file updated
to keep your application working smoothly!
Several other projects have gone down this route, most famously the Ruby on Rails project had to remove and rewrite their mime type handling code because of the license conflict. They created the Marcel library, also based on Apache Tika's rule definitions to replace the dependency on libmagic. Go has a similar mime detection library called go-mimetype. I've taken some design inspiration from them as well as taking their test inputs.
Using tika-magic
API Examples
tika-magic provides several ways to detect MIME types from files or byte arrays:
use std::fs::File;
use std::path::Path;
use tika_magic;
// Detect MIME type from a byte array
let data = [0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A]; // PNG file signature
let mime_type = tika_magic::from_u8(&data);
assert_eq!(mime_type, "image/png");
// Check if bytes match a specific MIME type
let is_png = tika_magic::match_u8("image/png", &data);
assert!(is_png);
// Get all possible MIME types (ordered by confidence)
let mime_types = tika_magic::from_u8_exhaustive(&data);
println!("Possible MIME types: {:?}", mime_types);
// File-based detection
let file = File::open("example.png").unwrap();
let mime_type = tika_magic::from_file(&file).unwrap();
assert_eq!(mime_type, "image/png");
// Path-based detection
let mime_type = tika_magic::from_filepath(Path::new("example.pdf")).unwrap();
assert_eq!(mime_type, "application/pdf");
// Check if a file matches a specific MIME type
let is_pdf = tika_magic::match_filepath("application/pdf", Path::new("example.pdf"));
assert!(is_pdf);
Installation
Add tika-magic to your Cargo.toml
:
cargo add tika-magic -F open_zips -F open_ole
Then include it in your Rust project:
use tika_magic;
The library has minimal dependencies and doesn't require any system libraries or external resources to work. There are two
optional features which add the zip
and ole
dependencies. If you enable the open_zips
feature, tika-magic will
open zip files and try to determine what file type they are. For example, without open_zips
an Android APK file will
report as an application/zip
but with it, it returns application/vnd.android.package-archive
. By enabling open_ole
,
it will differentiate between common OLE formats such as application/vnd.ms-excel
.
License
tika-magic is licensed under the Apache License, Version 2.0. See the LICENSE file for the full license text.
Copyright 2025 Ryan Stortz
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
The MIME type detection rules are derived from the Apache Tika project, which is also licensed under the Apache License 2.0.
Speed
tika-magic
is slower in the general case than tree_magic_mini
, as tree_magic_mini
is specifically optimized for
quick parsing. Both projects are optimized for a few code paths and have fairly identical results in those paths.
Anything not application/zip
, image/gif
, image/png
, or application/pdf
will be faster in tree_magic_mini
.
test tika-magic::from_u8::application_zip ... bench: 1,918 ns/iter (+/- 85)
test tika-magic::from_u8::image_gif ... bench: 20 ns/iter (+/- 1)
test tika-magic::from_u8::image_png ... bench: 11 ns/iter (+/- 10)
test tika-magic::from_u8::text_plain ... bench: 5,933,460 ns/iter (+/- 269,395)
test tika-magic::match_u8::application_zip ... bench: 14 ns/iter (+/- 2)
test tika-magic::match_u8::image_gif ... bench: 14 ns/iter (+/- 1)
test tika-magic::match_u8::image_png ... bench: 14 ns/iter (+/- 0)
test tika-magic::match_u8::text_plain ... bench: 15 ns/iter (+/- 0)
test tree_magic_mini::from_u8::application_zip ... bench: 5,364 ns/iter (+/- 524)
test tree_magic_mini::from_u8::image_gif ... bench: 1,567 ns/iter (+/- 90)
test tree_magic_mini::from_u8::image_png ... bench: 1,848 ns/iter (+/- 73)
test tree_magic_mini::from_u8::text_plain ... bench: 27,507 ns/iter (+/- 2,296)
test tree_magic_mini::match_u8::application_zip ... bench: 37 ns/iter (+/- 2)
test tree_magic_mini::match_u8::image_gif ... bench: 28 ns/iter (+/- 1)
test tree_magic_mini::match_u8::image_png ... bench: 27 ns/iter (+/- 1)
test tree_magic_mini::match_u8::text_plain ... bench: 16 ns/iter (+/- 1)
If you can afford to use the system magic database or to distribute GPL software, tree_magic_mini
is significantly
faster. Something for tika-magic
to improve on!
Dependencies
~2.8–5.5MB
~95K SLoC