2 unstable releases
0.2.0 | Dec 18, 2023 |
---|---|
0.1.0 | Dec 18, 2023 |
#1253 in Text processing
8KB
59 lines
grader
This CLI tool is designed to effectively perform a binary sort of large text
files by categorizing lines into two bins based on user-defined criteria. It
operates by streaming lines to a child process (such as grep
) and then sorts
these lines based on their echo response from the child process. Lines echoed
back are placed into 'bin1', ideally configured for the most expected case,
while lines not echoed back are categorized into 'bin2'.
This sorting mechanism relies on waiting to see an echoed line before assuming any omitted lines belong to 'bin2', making it important to configure 'bin1' for the more frequent case to avoid buffering. The tool is particularly useful for tasks like parsing log files or any large dataset where binary categorization is helpful for organization and analysis.
Install
cargo install grader
Usage
Binary sorter for text files. Lines are sorted into two bins based on child process response
Usage: grader <BIN1> <BIN2> <COMMAND> [ARGS]...
Arguments:
<BIN1> Path for output bin 1 (for echoed lines)
<BIN2> Path for output bin 2 (for non-echoed lines)
<COMMAND> Command to execute for processing lines
[ARGS]... Arguments for the command
Example
$ cat http.log
192.168.1.1 - - [16/Dec/2023:10:31:45 -0500] "GET /index.html HTTP/1.1" 200 4523
192.168.1.2 - - [16/Dec/2023:10:32:10 -0500] "GET /about.html HTTP/1.1" 200 3498
192.168.1.3 - - [16/Dec/2023:10:33:30 -0500] "POST /login HTTP/1.1" 500 1287 **(Error)**
192.168.1.4 - - [16/Dec/2023:10:34:22 -0500] "GET /contact.html HTTP/1.1" 200 2310
192.168.1.5 - - [16/Dec/2023:10:35:14 -0500] "GET /products.html HTTP/1.1" 200 4981
192.168.1.6 - - [16/Dec/2023:10:36:03 -0500] "GET / HTTP/1.1" 404 1748 **(Error)**
192.168.1.7 - - [16/Dec/2023:10:37:45 -0500] "GET /blog.html HTTP/1.1" 200 3250
192.168.1.8 - - [16/Dec/2023:10:38:52 -0500] "GET /news.html HTTP/1.1" 200 2891
192.168.1.9 - - [16/Dec/2023:10:39:17 -0500] "POST /api/data HTTP/1.1" 500 902 **(Error)**
192.168.1.10 - - [16/Dec/2023:10:40:05 -0500] "GET /terms.html HTTP/1.1" 200 4076
cat http.log | grader ok.log err.log -- grep -v -E "HTTP/1.1\" (500|404)"
$ cat ok.log
192.168.1.1 - - [16/Dec/2023:10:31:45 -0500] "GET /index.html HTTP/1.1" 200 4523
192.168.1.2 - - [16/Dec/2023:10:32:10 -0500] "GET /about.html HTTP/1.1" 200 3498
192.168.1.4 - - [16/Dec/2023:10:34:22 -0500] "GET /contact.html HTTP/1.1" 200 2310
192.168.1.5 - - [16/Dec/2023:10:35:14 -0500] "GET /products.html HTTP/1.1" 200 4981
192.168.1.7 - - [16/Dec/2023:10:37:45 -0500] "GET /blog.html HTTP/1.1" 200 3250
192.168.1.8 - - [16/Dec/2023:10:38:52 -0500] "GET /news.html HTTP/1.1" 200 2891
192.168.1.10 - - [16/Dec/2023:10:40:05 -0500] "GET /terms.html HTTP/1.1" 200 4076
$ cat err.log
192.168.1.3 - - [16/Dec/2023:10:33:30 -0500] "POST /login HTTP/1.1" 500 1287 **(Error)**
192.168.1.6 - - [16/Dec/2023:10:36:03 -0500] "GET / HTTP/1.1" 404 1748 **(Error)**
192.168.1.9 - - [16/Dec/2023:10:39:17 -0500] "POST /api/data HTTP/1.1" 500 902 **(Error)**
Dependencies
~1.2–1.8MB
~34K SLoC