2 releases
0.1.1 | Sep 20, 2024 |
---|---|
0.1.0 | Sep 4, 2024 |
#404 in Text processing
175KB
428 lines
ethan-rs-wc (erwc)
ethan-rs-ws(erwc) is word, line, character, and byte count. Like wc command but not just wc command, more accurate and faster. Text can also be read from standard input for statistics.
Getting Started
Manual Build
Requirements
- git
- You'll know you did it right if you can run
git --version
and see a response likegit version x.x.x
- You'll know you did it right if you can run
- rust
- Install Rust according to the official documentation, when you open the shell and run
rustc --version
and see a response likerustc x.y.z (abcabcabc yyyy-mm-dd)
. And on the shell you runcargo --version
and see a response likecargo x.y.z (abcabc yyyy-mm-dd)
. This means you successfully installed the Rust compilation environment.
- Install Rust according to the official documentation, when you open the shell and run
$ git clone https://github.com/ethancws/ethan-rs-wc.git
$ cd ethan-rs-wc
$ cargo build --release
The path of command is target/release/erwc, you could copy/move it to other place or just cargo run --release -- <args>
.
When you run the binary command file erwc directly, you can specify the relative path or absolute path of erwc, and then add the corresponding parameters. Or configure PATH for erwc.
Release
You could download the binary published in Release.
How to use
$ erwc tests/data/test.txt
You'll see like 9 5 38 42 tests/data/test.txt
. In order, they are number of words, lines, characters, bytes and file path.
$ erwc -l tests/data/test.txt
You'll see like 5 tests/data/test.txt
. The are number of lines and file path. This is the same as running the erwc --lines tests/data/test.txt
command.
$ erwc -lwcbL tests/data/test.txt
You'll see like 9 5 38 42 tests/data/test.txt 30@5
. In order, they are number of words, lines, characters, bytes and file path, the number of bytes with the most bytes, followed by the @ sign, then the line number with the most bytes.
You can run
erwc -h
orerwc --help
get more information.
Quick examples comparison
These examples compare statistics for a given file, which varies in size from the smallest files to dozens of bytes to the largest 1.2G bytes. And statistics on multiple files at the same time. Timings were collected on a system with Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz 4 core 16G.
In order to make the benchmark test as accurate and complete as possible, in addition to comparing the erwc command and the wc command, other commands are used to count the number of file lines and bytes, and compare the data statistics capabilities of the erwc and wc commands from the side.
In order to construct a large file of 1G bytes I used two commands to create tests/data/splitfile.txt
and tests/data/largefile.txt
file.
$ echo `base64 -i /dev/urandom | head -c 1000000000`| fold -w $((RANDOM % 50 + 50)) > tests/data/splitfile.txt
$ awk '{
output = "";
while (length($0) > 0) {
len = int(rand() * 7) + 1;
part = substr($0, 1, len);
output = output (output ? " " : "") part;
$0 = substr($0, len + 1);
}
print output;
}' tests/data/splitfile.txt > tests/data/largefile.txt
Single file statistics
Command | File | Time | Size |
---|---|---|---|
erwc tests/data/test.txt |
tests/data/test.txt | 0.00s user 0.00s system 93% cpu 0.006 total | 46 |
wc tests/data/test.txt |
tests/data/test.txt | 0.00s user 0.00s system 76% cpu 0.004 total | 46 |
erwc tests/data/sherlock.txt |
tests/data/sherlock.txt | 0.00s user 0.00s system 106% cpu 0.007 total | 90314 |
wc tests/data/sherlock.txt |
tests/data/sherlock.txt | 0.00s user 0.00s system 88% cpu 0.006 total | 90314 |
erwc /var/logs/keybagd.log.1 |
/var/logs/keybagd.log.1 | 0.01s user 0.00s system 76% cpu 0.016 total | 1049061 |
wc /var/logs/keybagd.log.1 |
/var/logs/keybagd.log.1 | 0.01s user 0.00s system 89% cpu 0.012 total | 1049061 |
erwc /var/log/install.log |
/var/log/install.log | 0.18s user 0.02s system 97% cpu 0.212 total | 48244124 |
wc /var/log/install.log |
/var/log/install.log | 0.20s user 0.01s system 98% cpu 0.216 total | 48244124 |
erwc tests/data/splitfile.txt |
tests/data/splitfile.txt | 3.60s user 0.91s system 86% cpu 5.210 total | 1000000015 |
wc tests/data/splitfile.txt |
tests/data/splitfile.txt | 5.86s user 0.18s system 99% cpu 6.060 total | 1000000015 |
erwc tests/data/largefile.txt |
tests/data/largefile.txt | 5.53s user 0.35s system 99% cpu 5.901 total | 1244571298 |
wc tests/data/largefile.txt |
tests/data/largefile.txt | 5.67s user 0.19s system 99% cpu 5.872 total | 1244571298 |
Multi-file statistics
Command | Time |
---|---|
erwc tests/data/largefile.txt tests/data/splitfile.txt tests/data/sherlock.txt tests/data/test.txt |
9.04s user 0.60s system 156% cpu 6.166 total |
wc tests/data/largefile.txt tests/data/splitfile.txt tests/data/sherlock.txt tests/data/test.txt |
9.34s user 0.33s system 99% cpu 9.694 total |
Count file lines
largefile.txt
Tool | Command | Number |
---|---|---|
awk | awk 'END {print NR}' tests/data/largefile.txt |
10869566 |
sed | sed -n '$=' tests/data/largefile.txt |
10869566 |
grep | grep -c '' tests/data/largefile.txt |
10869566 |
cat | cat -n tests/data/largefile.txt | tail -n 1 |
10869566 |
splitfile.txt
Tool | Command | Number |
---|---|---|
awk | awk 'END {print NR}' tests/data/splitfile.txt |
10869566 |
sed | sed -n '$=' tests/data/splitfile.txt |
10869566 |
grep | grep -c '' tests/data/splitfile.txt |
10869566 |
cat | cat -n tests/data/splitfile.txt | tail -n 1 |
10869566 |
sherlock.txt
Tool | Command | Number |
---|---|---|
awk | awk 'END {print NR}' tests/data/sherlock.txt |
2133 |
sed | sed -n '$=' tests/data/sherlock.txt |
2133 |
grep | grep -c '' tests/data/sherlock.txt |
2133 |
cat | cat -n tests/data/sherlock.txt | tail -n 1 |
2133 |
test.txt
Tool | Command | Number |
---|---|---|
awk | awk 'END {print NR}' tests/data/test.txt |
5 |
sed | sed -n '$=' tests/data/test.txt |
5 |
grep | grep -c '' tests/data/test.txt |
5 |
cat | cat -n tests/data/test.txt | tail -n 1 |
5 |
Statistics of the total number of bytes in the file
largefile.txt
Tool | Command | Size |
---|---|---|
stat | stat -f %z tests/data/largefile.txt |
1255440864 |
du | du -k tests/data/largefile.txt |
1229796k |
ls | ls -l tests/data/largefile.txt |
1255440864 |
splitfile.txt
Tool | Command | Size |
---|---|---|
stat | stat -f %z tests/data/splitfile.txt |
1010869565 |
du | du -k tests/data/splitfile.txt |
999712k |
ls | ls -l tests/data/splitfile.txt |
1010869565 |
sherlock.txt
Tool | Command | Size |
---|---|---|
stat | stat -f %z tests/data/sherlock.txt |
90314 |
du | du -k tests/data/sherlock.txt |
92k |
ls | ls -l tests/data/sherlock.txt |
90314 |
test.txt
Tool | Command | Size |
---|---|---|
stat | stat -f %z tests/data/test.txt |
46 |
du | du -k tests/data/test.txt |
4? |
ls | ls -l tests/data/test.txt |
46 |
Running tests
$ cargo test
Thank You!
If you appreciated this, feel free to follow me or donate!
Solana Address: 3gArMnKUHkZ1eEry4dD8zdMpJH385HKrUdnG9ig6S5Zy
Resources
Dependencies
~2.5MB
~40K SLoC