#load-testing #log-parser #line #random #generator #per #second

bin+lib fakelogs

fakelogs is a random log generator. It can be used for load testing of log parsers.

5 releases

0.1.10-75501d4 Feb 22, 2020
0.1.9-bb1f021 Feb 3, 2020
0.1.7-35bfaa1 Feb 1, 2020
0.1.6-a180978 Jan 5, 2020
0.1.5-6e5ec74 Jan 5, 2020

#2101 in Parser implementations

MIT license

32KB
566 lines

Fakelogs

Fakelogs is a random log generator. It can be used for load testing of log parsers.

It is written in Rust and is mostly a toy project to ramp up on the language. It might however be useful. Use at your own risk.

Fakelogs icon

Status

Build Status

Current version is 0.1.10.

Install

No install target yet, copy the fakelogs binary in your $PATH if you wish, that's all.

A few commands which may prove useful:

cargo build             # build debug binary in ./target/debug/
cargo build --release   # build release binary in ./target/release/
cargo test              # launch tests
rustfmt src/*.rs        # format code
./docker-build.sh       # build Docker image with version tag
./bump-version.sh       # bump minor version number

Usage

Simply launch:

cargo run

Or just run the binary directly:

./target/debug/fakelogs
./target/release/fakelogs

Alternatively, using docker:

docker run ufoot/fakelogs

To pass options:

cargo run -- --csv -100

By default, the generated lines follow the Apache common line format, so look like:

127.0.0.1 - james [09/May/2018:16:00:39 +0000] "GET /report HTTP/1.0" 200 123
127.0.0.1 - jill [09/May/2018:16:00:41 +0000] "GET /api/user HTTP/1.0" 200 234
127.0.0.1 - frank [09/May/2018:16:00:42 +0000] "POST /api/user HTTP/1.0" 200 34
127.0.0.1 - mary [09/May/2018:16:00:42 +0000] "POST /api/user HTTP/1.0" 503 12

There's a -c or --csv option, if you call fakelogs -c you get an alternate custom CSV format:

"10.0.0.4","-","apache",1549573860,"GET /api/user HTTP/1.0",200,1234

If you pass an integer after a dash, it defines the average number of lines per second. The default is 1000. Maximum is 1000000. Eg to change the output to 10000 lines per second:

fakelogs -10000

Other standard options include:

  • -h, --help: display a short help.
  • -v, --version: display version.
  • --no-high-card: disable high cardinality, the random 4 letters sections are replaced by xxxx
  • --no-time-skew: disable time skewing, all logs look, on an average, as if they are just from now, and not 30 minutes old.
  • --no-time-jitter: disable time jittering, all logs have strict increasing time.
  • --no-header: skip the header line
  • --no-junk: no random junk lines
  • --no-burst: no random burst behavior, allows output at a constant rate

Logs content

The logs may look random, but they follow a few patterns:

  • IPs are chosen in a constant, finite list
  • users are chosen in a constant, finite list
  • HTTP codes are distributed with:
    • 50% of 2XXs
    • 25% of 3XXs
    • 20% of 4XXs
    • 5% of 5XXs
  • request methods are distributed with:
    • 60% of GETs
    • 20% of POSTs
    • 20% of HEADs
  • the URLs are of the form /section/XXXX-file.ext or /XXXX/file.txt with XXXX being totally random where section can be:
    • 50% of yolo (eg: /yolo/wE5d-index.html)
    • 15% of foo/bar
    • 15% of bar/foo
    • 15% of "no section" (so URL of the form /w3QL/secret.txt)
    • 5% of pizzapino
  • size is uniformly distributed between 100 bytes and 19,9k (average is 10k).
  • generally, timestamps are generated to match the generation time, minus 30 minutes, so log appear, on an average, to be from half an hour ago.
  • but... 10% of the time timestamp is shifted in the past or in the future, by up to 2 minutes, with an average of 1 minute. This means timestamps are not increasing, order is not respected.
  • every 5 seconds, the rate may changes, it can either be just one line per second (slow output) or 2500 lines per second (fast output). The ratio is:
    • 40% of fast output
    • 60% of slow output
    • on an average (including the slow output) the throughput should be slightly above 1000 lines per second.
  • when the default output of 1000 lines per second is changed, all numbers above are scaled, but the slow output is always one line per second.
  • once out of 1000, an invalid line containing Your attention please, this is a hack! pops out.

License

Fakelogs is licensed under the MIT license.

Dependencies

~1.5MB
~28K SLoC