3 releases (breaking)
Uses old Rust 2015
0.3.0 | May 7, 2018 |
---|---|
0.2.0 | Mar 30, 2018 |
0.1.0 | Feb 22, 2018 |
#1155 in Text processing
28 downloads per month
Used in dedup
13KB
252 lines
A better deduplicator written in Rust.
Basic usage: dedup <INPUT> [-o <OUTPUTFILE>]
Run dedup --help
to see:
USAGE:
dedup.exe [FLAGS] [OPTIONS] [INPUT]
FLAGS:
-l, --count-lines If flag is set only print the number of unique entries found.
--mmap Enables use of memory mapped files. This is enabled by default.
--no-mmap Prohibits usage of memory mapped files. This will slow down the deduplication process
significantly!
-z, --zero-terminated Specifies that entries should be intepreted as being separated by a null byte rather than a
newline.
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
-o, --output <OUTPUT>
--terminator <TERMINATOR> Specifies the single-byte pattern to separate entries by. Default is newline.
[default: \n]
ARGS:
<INPUT> Specifies the input file to read from. Omit or supply '-' to read from stdin.
To run the benchmark run python benchsuite/benchrunner
. This will download a large (400MB+) text file to use as a benchmark case.
Feature requests and bug reports are always welcome! Please raise them as an issue in this Github repository.
lib.rs
:
This crate provides one function: fastchr
, which very quickly finds the first occurrence of a given byte in a slice.
fastchr
is implemented using SIMD intrinsics and runtime CPU feature detection so it will always use the fastest method available
on a platform. If SIMD features are not available, fastchr
falls back to using memchr
.
Dependencies
~170–315KB