27 releases
0.8.0 | Sep 11, 2024 |
---|---|
0.7.3 | Apr 12, 2023 |
0.7.2 | Dec 26, 2022 |
0.7.0 | Sep 26, 2022 |
0.5.5 | Mar 3, 2020 |
#820 in Parser implementations
32KB
443 lines
alog
alog
is a simple log file anonymizer.
About
In fact by default alog
just replaces the first word on every line of any input stream
with a customizable string.
So "log file anonymizer" might be a bit of an overstatement, but alog
can be used to (very
efficiently) replace the $remote_addr
part in many access log formats, e.g. Nginx' default
combined log format:
log_format combined '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent"';
By default any parseable $remote_addr
is replaced by it's localhost representation,
- any valid IPv4 address is replaced by 127.0.0.1,
- any valid IPv6 address is replaced by ::1 and
- any String (what might be a domain name) with localhost.
Lines without a $remote_addr
part will remain unchanged (but can be skipped).
Changes
With version 0.7
- All
ASCII whitespace character
s are removed from the beginning of each line by default. - The run() and run_raw() funktions will now return a Result instead of exiting on failure.
With version 0.6
- You can (at a substantial cost of CPU cycles) replace the
$remote_user
with '-' as well and - by default any leading Spaces or Tabs will be removed from every line before replacing any
$remote_addr
.
Building alog
With version 0.3 [features]
where added, so that the library crate won't pull unneeded
dependencies anymore.
Commandline Tool
To build the alog
commandline tool you now have to expicitly add --features
.
cargo build --features alog-cli
or
cargo build --all-features
Usage
Commandline tool
Run cli-tool with --help
.
./target/release/alog --help
Library
Calling run()
fn main() {
let mut io_conf = alog::IOConfig::default();
let mut conf = alog::Config::default();
io_conf.push_input("/tmp/test.log");
conf.set_ipv4_value("0.0.0.0");
if let Err(e) = alog::run(&conf, &io_conf) {
eprintln!("{}", e);
}
}
or run_raw()
use std::io::Cursor;
fn main() {
let mut buffer = vec![];
if let Err(e) = alog::run_raw(
&alog::Config {
ipv4: "XXX",
..Default::default()
},
Cursor::new(b"8.8.8.8 test line"),
&mut buffer,
) {
eprintln!("{}", e);
}
assert_eq!(buffer, b"XXX test line");
}
About Config::authuser
With version 0.6 alog
can be used to replace the $remote_user
field with '-', but this
feature comes with a couple of peculiarities.
This feature should work fine with standard Common / Combined Log formatted files, but...
-
There will be a significant hit on performance (synthetic benchmarking suggests ~625MB/s instead of ~1100MB/s on my machine, but still better than Perl's ~115MB/s ;)
-
Used with
Config::trim
set tofalse
and malformatted files the performance hit will be even worse and removal of the$remote_user
field will fail altogether if no$time_local
field is found. -
The
$time_local
field is expected to start with '[' followed by a decimal number. E.g.: "[10/Oct/2000:13:55:36 -0700]" -
There is an optimization in place to reduce the performance hit with real-life log files, but this leads to
$remote_user
fields starting with "- [" not being replaced! So in"8.8.8.8 - - [frank] [10/Oct/2000:13:55:36 -0700] GET /apache_pb.gif HTTP/1.0 200 2326"
"frank" will still be "frank". This optimization can be disabled.
Project status
alog
started as a replacement for a <10 line Perl script running on an old backup host.
So nothing shiny.. but it helped me learning some Rust (and crates.io) basics.
With version 0.6 alog
is feature complete. It doesn't do much, but it does it quite well.
At some point I might re-use this crate and try harder to actually anonymize data. But for
now, this is it.
I will still fix bugs when (and if) I find them, so alog
is now passively-maintained.
Dependencies
~2MB
~53K SLoC