26 releases

0.7.3 Apr 12, 2023
0.7.2 Dec 26, 2022
0.7.0 Sep 26, 2022
0.6.5 May 9, 2022
0.5.5 Mar 3, 2020

#398 in Parser implementations

MIT license

35KB
540 lines

alog

Crates.io CI Documentation GitHub license

alog is a simple log file anonymizer.

About

In fact by default alog just replaces the first word on every line of any input stream with a customizable string.

So "log file anonymizer" might be a bit of an overstatement, but alog can be used to (very efficiently) replace the $remote_addr part in many access log formats, e.g. Nginx' default combined log format:

log_format combined '$remote_addr - $remote_user [$time_local] '
                    '"$request" $status $body_bytes_sent '
                    '"$http_referer" "$http_user_agent"';

By default any parseable $remote_addr is replaced by it's localhost representation,

  • any valid IPv4 address is replaced by 127.0.0.1,
  • any valid IPv6 address is replaced by ::1 and
  • any String (what might be a domain name) with localhost.

Lines without a $remote_addr part will remain unchanged (but can be skipped).

Changes

With version 0.7

  • All ASCII whitespace characters are removed from the beginning of each line by default.
  • The run() and run_raw() funktions will now return a Result instead of exiting on failure.

With version 0.6

  • You can (at a substantial cost of CPU cycles) replace the $remote_user with '-' as well and
  • by default any leading Spaces or Tabs will be removed from every line before replacing any $remote_addr.

Building alog

With version 0.3 [features] where added, so that the library crate won't pull unneeded dependencies anymore.

Commandline Tool

To build the alog commandline tool you now have to expicitly add --features.

cargo build --features alog-cli

or

cargo build --all-features

Usage

Commandline tool

Run cli-tool with --help.

./target/release/alog --help

Library

Calling run()

fn main() {
    let mut io_conf = alog::IOConfig::default();
    let mut conf = alog::Config::default();

    io_conf.push_input("/tmp/test.log");
    conf.set_ipv4_value("0.0.0.0");

    if let Err(e) = alog::run(&conf, &io_conf) {
        eprintln!("{}", e);
    }

}

or run_raw()

use std::io::Cursor;

fn main() {
    let mut buffer = vec![];

    if let Err(e) = alog::run_raw(
        &alog::Config {
            ipv4: "XXX",
            ..Default::default()
        },
        Cursor::new(b"8.8.8.8 test line"),
        &mut buffer,
    ) {
        eprintln!("{}", e);
    }

    assert_eq!(buffer, b"XXX test line");
}

About Config::authuser

With version 0.6 alog can be used to replace the $remote_user field with '-', but this feature comes with a couple of peculiarities.

This feature should work fine with standard Common / Combined Log formatted files, but...

  • There will be a significant hit on performance (synthetic benchmarking suggests ~625MB/s instead of ~1100MB/s on my machine, but still better than Perl's ~115MB/s ;)

  • Used with Config::trim set to false and malformatted files the performance hit will be even worse and removal of the $remote_user field will fail altogether if no $time_local field is found.

  • The $time_local field is expected to start with '[' followed by a decimal number. E.g.: "[10/Oct/2000:13:55:36 -0700]"

  • There is an optimization in place to reduce the performance hit with real-life log files, but this leads to $remote_user fields starting with "- [" not being replaced! So in

    "8.8.8.8 - - [frank] [10/Oct/2000:13:55:36 -0700] GET /apache_pb.gif HTTP/1.0 200 2326"

    "frank" will still be "frank". This optimization can be disabled.

Project status

alog started as a replacement for a <10 line Perl script running on an old backup host. So nothing shiny.. but it helped me learning some Rust (and crates.io) basics.

With version 0.6 alog is feature complete. It doesn't do much, but it does it quite well. At some point I might re-use this crate and try harder to actually anonymize data. But for now, this is it.

I will still fix bugs when (and if) I find them, so alog is now passively-maintained.

Dependencies

~2.8–4MB
~69K SLoC