44 releases
new 0.6.71 | Apr 16, 2024 |
---|---|
0.6.70 | Mar 25, 2024 |
0.6.68 | Jul 22, 2023 |
0.5.58 | Mar 29, 2023 |
0.1.40 | Nov 23, 2022 |
#89 in Filesystem
529 downloads per month
2.5MB
51K
SLoC
Super Speedy Syslog Searcher! (s4
)
Speedily search and merge log messages by datetime.
Super Speedy Syslog Searcher (s4
) is a command-line tool to search
and merge varying log messages from varying log files, sorted by datetime.
This includes log files that are compressed (.gz
, .xz
), or archived (.tar
),
and binary format logs including acct, lastlog, and utmp accounting records
(acct
, pacct
, lastlog
, utmp
, utmpx
, wtmp
),
systemd journal logs (.journal
), and Microsoft Event Logs (.evtx
).
It will parse a variety of formal and ad-hoc log message datetime formats.
Datetime filters may be passed to narrow the search to a datetime range.
The first goal of s4
is speedy searching and printing.
- Use
- About
- More
- logging chaos: the problem
s4
solves - Stargazers
- Further Reading
Use
Install super_speedy_syslog_searcher
Assuming rust is installed, run
cargo install --locked super_speedy_syslog_searcher
Run s4
For example, print all the log messages in syslog files under /var/log/
s4 /var/log
On Windows, print the ad-hoc logs under C:\Windows\Logs
s4.exe C:\Windows\Logs
Or the Windows Event logs
s4.exe C:\Windows\System32\winevt\Logs
Print the log messages after January 1, 2022 at 00:00:00
s4 /var/log -a 20220101
Print the log messages from January 1, 2022 00:00:00 to January 2, 2022
s4 /var/log -a 20220101 -b 20220102
or
s4 /var/log -a 20220101 -b @+1d
Print the log messages on January 1, 2022, from 12:00:00 to 16:00:00
s4 /var/log -a 20220101T120000 -b 20220101T160000
Print only the log messages since yesterday at this time
s4 /var/log -a=-1d
Print only the log messages that occurred two days ago
(with the help of GNU date
)
s4 /var/log -a $(date -d "2 days ago" '+%Y%m%d') -b @+1d
Print only the log messages that occurred two days ago during the noon hour
(with the help of GNU date
)
s4 /var/log -a $(date -d "2 days ago 12" '+%Y%m%dT%H%M%S') -b @+1h
Print only the log messages that occurred two days ago during the noon hour in
Bengaluru, India (timezone offset +05:30) and prepended with equivalent UTC
datetime (with the help of GNU date
)
s4 /var/log -u -a $(date -d "2 days ago 12" '+%Y%m%dT%H%M%S+05:30') -b @+1h
--help
Speedily search and merge log messages by datetime.
DateTime filters may be passed to narrow the search.
It aims to be very fast.
Usage: s4 [OPTIONS] <PATHS>...
Arguments:
<PATHS>... Path(s) of log files or directories.
Directories will be recursed. Symlinks will be followed.
Paths may also be passed via STDIN, one per line. The user must
supply argument "-" to signify PATHS are available from STDIN.
Options:
-a, --dt-after <DT_AFTER>
DateTime Filter After: print syslog lines with a datetime that is at
or after this datetime.
For example, "20200102T120000" or "-5d".
-b, --dt-before <DT_BEFORE>
DateTime Filter Before: print syslog lines with a datetime that is at
or before this datetime.
For example, "2020-01-03T23:00:00.321-05:30" or "@+1d+11h"
-t, --tz-offset <TZ_OFFSET>
Default timezone offset for datetimes without a timezone.
For example, log message "[20200102T120000] Starting service" has a
datetime substring "20200102T120000".
The datetime substring does not have a timezone offset
so the TZ_OFFSET value would be used.
Example values, "+12", "-0800", "+02:00", or "EDT".
To pass a value with leading "-" use "=" notation, e.g. "-t=-0800".
If not passed then the local system timezone offset is used.
[default: -07:00]
-z, --prepend-tz <PREPEND_TZ>
Prepend a DateTime in the timezone PREPEND_TZ for every line.
Used in PREPEND_DT_FORMAT.
-u, --prepend-utc
Prepend a DateTime in the UTC timezone offset for every line.
This is the same as "--prepend-tz Z".
Used in PREPEND_DT_FORMAT.
-l, --prepend-local
Prepend DateTime in the local system timezone offset for every line.
This is the same as "--prepend-tz +XX" where +XX is the local system
timezone offset.
Used in PREPEND_DT_FORMAT.
-d, --prepend-dt-format <PREPEND_DT_FORMAT>
Prepend a DateTime using the strftime format string.
If PREPEND_TZ is set then that value is used for any timezone offsets,
i.e. strftime "%z" "%:z" "%Z" values, otherwise the timezone offset value
is the local system timezone offset.
[Default: %Y%m%dT%H%M%S%.3f%z]
-n, --prepend-filename
Prepend file basename to every line.
-p, --prepend-filepath
Prepend file full path to every line.
-w, --prepend-file-align
Align column widths of prepended data.
--prepend-separator <PREPEND_SEPARATOR>
Separator string for prepended data.
[default: :]
--separator <LOG_MESSAGE_SEPARATOR>
An extra separator string between printed log messages.
Per log message not per line of text.
Accepts a basic set of backslash escape sequences,
e.g. "\0" for the null character.
--journal-output <JOURNAL_OUTPUT>
The format for .journal file log messages.
Matches journalctl --output options.
[default: short]
[possible values: short, short-precise, short-iso, short-iso-precise,
short-full, short-monotonic, short-unix, verbose,
export, cat]
-c, --color <COLOR_CHOICE>
Choose to print to terminal using colors.
[default: auto]
[possible values: always, auto, never]
--blocksz <BLOCKSZ>
Read blocks of this size in bytes.
May pass value as any radix (hexadecimal, decimal, octal, binary).
Using the default value is recommended.
Most useful for developers.
[default: 65535]
-s, --summary
Print a summary of files processed to stderr.
Most useful for developers.
-h, --help
Print help
-V, --version
Print version
DateTime Filters may be strftime specifier patterns:
"%Y%m%dT%H%M%S*"
"%Y-%m-%d %H:%M:%S*"
"%Y-%m-%dT%H:%M:%S*"
"%Y/%m/%d %H:%M:%S*"
"%Y%m%d"
"%Y-%m-%d"
"%Y/%m/%d"
"+%s"
Each * is an optional trailing 3-digit fractional sub-seconds,
or 6-digit fractional sub-seconds, and/or timezone.
Pattern "+%s" is Unix epoch timestamp in seconds with a preceding "+".
For example, value "+946684800" is be January 1, 2000 at 00:00, GMT.
DateTime Filters may be custom relative offset patterns:
"+DwDdDhDmDs" or "-DwDdDhDmDs"
"@+DwDdDhDmDs" or "@-DwDdDhDmDs"
Custom relative offset pattern "+DwDdDhDmDs" and "-DwDdDhDmDs" is the offset
from now (program start time) where "D" is a decimal number.
Each lowercase identifier is an offset duration:
"w" is weeks, "d" is days, "h" is hours, "m" is minutes, "s" is seconds.
For example, value "-1w22h" is one week and twenty-two hours in the past.
Value "+30s" is thirty seconds in the future.
Custom relative offset pattern "@+DwDdDhDmDs" and "@-DwDdDhDmDs" is relative
offset from the other datetime.
Arguments "-a 20220102 -b @+1d" are equivalent to "-a 20220102 -b 20220103".
Arguments "-a @-6h -b 20220101T120000" are equivalent to
"-a 20220101T060000 -b 20220101T120000".
Without a timezone, the Datetime Filter is presumed to be the local system
timezone.
Command-line passed timezones may be numeric timezone offsets,
e.g. "+09:00", "+0900", or "+09", or named timezone offsets, e.g. "JST".
Ambiguous named timezones will be rejected, e.g. "SST".
--prepend-tz and --dt-offset function independently:
--dt-offset is used to interpret processed log message datetime stamps that
do not have a timezone offset.
--prepend-tz affects what is pre-printed before each printed log message line.
--separator accepts backslash escape sequences:
"\0","\a","\b","\e","\f","\n","\r","\\","\t","\v",
Resolved values of "--dt-after" and "--dt-before" can be reviewed in
the "--summary" output.
DateTime strftime specifiers are described at
https://docs.rs/chrono/latest/chrono/format/strftime/
DateTimes supported are only of the Gregorian calendar.
DateTimes supported language is English.
Is s4 failing to parse a log file? Report an Issue at
https://github.com/jtmoon79/super-speedy-syslog-searcher/issues/new/choose
About
Why s4
?
Super Speedy Syslog Searcher (s4
) is meant to aid Engineers in reviewing
varying log files in a datetime-sorted manner.
The primary use-case is to aid investigating problems wherein the time of
a problem occurrence is known and there are many available logs
but otherwise there is little source evidence.
Currently, log file formats vary widely. Most logs are an ad-hoc format. Even separate log files on the same system for the same service may have different message formats! Sorting these logged messages by datetime may be prohibitively difficult. The result is an engineer may have to "hunt and peck" among many log files, looking for problem clues around some datetime; so tedious!
Enter Super Speedy Syslog Searcher π¦Έ βΌ
s4
will print log messages from multiple log files in datetime-sorted order.
A "window" of datetimes may be passed, to constrain the period of printed
messages. This will assist an engineer that, for example, needs to view all
log messages that occurred two days ago between 12:00 and 12:05 among log files taken from multiple
systems.
The ulterior motive for Super Speedy Syslog Searcher was the primary developer wanted an excuse to learn rust π¦, and wanted to create an open-source tool for a recurring need of some Software Test Engineers π
See the real-world example rationale in the section below,
logging chaos: the problem s4
solves.
Features
- Parses:
- Ad-hoc log messages using formal datetime formats:
- Internet Message Format (RFC 2822)
e.g. Wed, 1 Jan 2020 22:00:00 PST message⦠- The BSD syslog Protocol (RFC 3164)
e.g. <8>Jan 1 22:00:00 message⦠- Date and Time on the Internet: Timestamps (RFC 3339)
e.g. 2020-01-01T22:00:00-08:00 message⦠- The Syslog Protocol (RFC 5424)
e.g. 2020-01-01T22:00:00-08:00 message⦠- ISO 8601
e.g. 2020-01-01T22:00:00-08:00 messageβ¦, 20200101T220000-0800 messageβ¦, etc. [1]
- Internet Message Format (RFC 2822)
- Red Hat Audit Log files
- binary user accounting records files
(
acct
,pacct
,lastlog
,utmp
,utmpx
) from multiple Operating Systems and CPU architectures - binary Windows Event Log files
- binary systemd journal files with printing options matching
journalctl
- many varying text log messages with ad-hoc datetime formats
- multi-line log messages
- Ad-hoc log messages using formal datetime formats:
- Inspects
.tar
archive files for parseable log files [2] - Inspects
.gz
and.xz
compressed files for parseable log files [3] - Tested against "in the wild" log files from varying sources
(see project path
./logs/
) - Prepends datetime and file paths, for easy programmatic parsing or visual traversal of varying log messages
- Comparable speed as GNU
grep
andsort
- Processes invalid UTF-8
- Accepts arbitrarily large files see _Hacks_
Limitations
- Only processes UTF-8 or ASCII encoded syslog files. (Issue #16)
- Cannot process multi-file
.gz
files (only processes first stream found). (Issue #8) - Cannot process multi-file
.xz
files (only processes first stream found). (Issue #11) - Cannot process
.zip
archives (Issue #39) - [1] ISO 8601
- ISO 8601 forms recognized
(using ISO descriptive format)
YYYY-MM-DDThh:mm:ss
YYYY-MM-DDThhmmss
YYYYMMDDThhmmss
(may use date-time separator character'T'
or character blank space' '
)
- ISO 8601 forms not recognized:
- Absent seconds
- Ordinal dates, i.e. "day of the year", format
YYYY-DDD
, e.g."2022-321"
- Week dates, i.e. "week-numbering year", format
YYYY-Www-D
, e.g."2022-W25-1"
- times without minutes and seconds (i.e. only
hh
)
- ISO 8601 forms recognized
(using ISO descriptive format)
- [2] Cannot process archive files or compressed files within other
archive files or compressed files (Issue #14),
e.g.logs.tgz
, e.g. filesyslog.xz
file within archivelogs.tar
- [3] Can only process compressed syslog files (Issue #9, Issue #12, Issue #13, Issue #86)
Hacks
- Entire
.xz
files are read into memory before printing (Issue #12) - Entire
.evtx
files are read into memory before printing (Issue #86) - Entire user accounting record files are read into memory before printing
More
Comparisons
An overview of features of varying log mergers including GNU tools.
- GNU
grep
piped to GNUsort
- Super Speedy Syslog Searcher;
s4
- logmerger;
logmerger
- Toolong;
tl
- logdissect;
logdissect.py
Symbol | |
---|---|
β | Yes |
⬀ | Most |
β | Some |
β | No |
β | with an accompanying GNU program |
! | with user input |
βΌ | with complex user input |
General Features
Program | Source | CLI | TUI | Interactive | live tail | merge varying log formats | datetime search range |
---|---|---|---|---|---|---|---|
grep | sort |
C | β | β | β | β tail |
β | βΌ |
s4 |
Rust | β | β | β | β | β | β |
logmerger |
Python | β | β | β | β | βΌ | β |
tl |
Python | β | β | β | β | β | β |
logdissect.py |
Python | β | β | β | β | β | β |
Formal Log DateTime Supported
Program | RFC 2822 | RFC 3164 | RFC 3339 | RFC 5424 | ISO 8601 |
---|---|---|---|---|---|
grep | sort |
β | βΌ | ! | ! | ! |
s4 |
β | β | β | β | ⬀ |
logmerger |
β | β | ! | ! | β |
tl |
β | β | β | β | β |
Other Log or File Formats Supported
Program | Ad-hoc text formats | Red Hat Audit Log | journal | acct /lastlog /utmp |
.evtx |
.pcap /.pcapng |
.jsonl |
---|---|---|---|---|---|---|---|
grep | sort |
βΌ | ! | β | β | β | β | β |
s4 |
β | β | β | β | β | β | β |
logmerger |
βΌ | βΌ | β | β | β | β | β |
tl |
β | β | β | β | β | β | β |
Archive Formats Supported
Program | .gz |
.bz /.bz2 |
.xz |
.tar |
.zip |
---|---|---|---|---|---|
grep | sort |
β zgrep |
β bzip2 |
β xz |
β | β |
s4 |
β | β | β | β | β |
logmerger |
β | β | β | β | β |
tl |
β | β | β | β | β |
logdissect.py |
β | β | β | β | β |
Speed Comparison
A comparison of merging three large log files:
- 2000 line log file, 1116357 bytes (β1.1 MB), with high-plane unicode
- 2500 line log file, 1078842 bytes (β1.0 MB), with high-plane unicode
- 5000 line log file, 2158138 bytes (β2.1 MB), with high-plane unicode
Program | real | user | sys |
---|---|---|---|
grep | sort |
0.04 | 0.04 | 0.00 |
s4 |
0.04 | 0.03 | 0.04 |
logmerger |
0.78 | 0.74 | 0.00 |
tl |
0.94 | 0.55 | 0.06 |
This informal runtime comparison used GNU time
running on Ubuntu 22 on WSL2.
grep
3.7,sort
8.32s4
0.6.71logmerger
0.8.0 on Python 3.10.12tl
1.4.0 on Python 3.10.12
See directory compare-log-mergers and results in compare-log-mergers.txt
.
Building locally
Building on Linux requires:
rust
minimal or moregcc
(which should installcc
,libc
, andlibc-headers
)
From the git cloned project directory run cargo build
.
Parsing .journal
files
Requires libsystemd
to be installed to use libsystemd.so
at runtime.
Requesting Support For DateTime Formats; your particular log file
If you have found a log file that Super Speedy Syslog Searcher does not parse then you may create a new Issue type Feature request (datetime format).
Here is an example user-submitted Issue.
"syslog" and other project definitions
syslog
In this project, the term "syslog" is used generously to refer to any log message that has a datetime stamp on the first line of log text.
Technically, "syslog" is defined among several RFCs proscribing fields, formats, lengths, and other technical constraints. In this project, the term "syslog" is interchanged with "log".
The term "sysline" refers to a one log message which may comprise multiple text lines.
See docs section Definitions of data for more project definitions.
log message
A "log message" is a single log entry for any type of logging scheme; an entry in a utmpx file, an entry in a systemd journal, an entry in a Windows Event Log, a formal syslog message, or an ad-hoc log message.
logging chaos: the problem s4
solves
In practice, most log file formats are an ad-hoc format. And among formally defined log formats, there are many variations. The result is merging varying log messages by datetime is prohibitively tedious.
The following real-world example log files are available in project directory
./logs
.
open-source software examples
nginx webserver
For example, the open-source nginx web server
logs access attempts in an ad-hoc format in the file access.log
192.168.0.115 - - [08/Oct/2022:22:26:35 +0000] "GET /DOES-NOT-EXIST HTTP/1.1" 404 0 "-" "curl/7.76.1" "-"
which is an entirely dissimilar log format to the neighboring nginx log file,
error.log
2022/10/08 22:26:35 [error] 6068#6068: *3 open() "/usr/share/nginx/html/DOES-NOT-EXIST" failed (2: No such file or directory), client: 192.168.0.115, server: _, request: "GET /DOES-NOT-EXIST HTTP/1.0", host: "192.168.0.100"
nginx is following the bad example set by the apache web server.
Debian 11
Here is a log snippet from a Debian 11 host, file /var/log/alternatives.log
:
update-alternatives 2022-10-10 23:59:47: run with --quiet --remove rcp /usr/bin/ssh
And a snippet from the same Debian 11 host, file /var/log/dpkg.log
:
2022-10-10 15:15:02 upgrade gpgv:amd64 2.2.27-2 2.2.27-2+deb11u1
And a snippet from the same Debian 11 host, file /var/log/kern.log
:
Oct 10 23:07:16 debian11-b kernel: [ 0.10034] Linux version 5.10.0-11-amd64
And a snippet from the same Debian 11 host,
file /var/log/unattended-upgrades/unattended-upgrades-shutdown.log
:
2022-10-10 23:07:16,775 WARNING - Unable to monitor PrepareForShutdown() signal, polling instead.
binary files
And then there are binary files, such as the wtmp
file on Linux and other
Unix Operating Systems.
Using tool utmpdump
, a utmp
record structure is converted to text like:
[7] [12103] [ts/0] [user] [pts/0] [172.1.2.1] [172.1.2.2] [2023-03-05T23:12:36,270185+00:00]
And from a systemd .journal
file, read using journalctl
Mar 03 10:26:10 host systemd[1]: Started OpenBSD Secure Shell server.
ββ Subject: A start job for unit ssh.service has finished successfully
ββ Defined-By: systemd
ββ Support: http://www.ubuntu.com/support
ββ
ββ A start job for unit ssh.service has finished successfully.
ββ
ββ The job identifier is 120.
Mar 03 10:31:23 host sshd[4559]: Accepted login for user1 from 172.1.2.1 port 51730 ssh2
Try merging those two log messages by datetime using GNU grep
, sort
, sed
,
or awk
!
Additionally, if the wtmp
file is from a different architecture
or Operating System, then the binary record structure is likely not parseable
by the resident utmpdump
tool. What then!?
commercial software examples
Commercial software and computer hardware vendors nearly always use ad-hoc log message formatting that is even more unpredictable among each log file on the same system.
Synology DiskStation
Here is a log snippet from a Synology DiskStation package DownloadStation:
2019/06/23 21:13:34 (system) trigger DownloadStation 3.8.13-3519 Begin start-stop-status start
And a snippet from a Synology DiskStation OS log file sfdisk.log
on the same
host:
2019-04-06T01:07:40-07:00 dsnet sfdisk: Device /dev/sdq change partition.
And a snippet from a Synology DiskStation OS log file synobackup.log
on the
same host:
info 2018/02/24 02:30:04 SYSTEM: [Local][Backup Task Backup1] Backup task started.
(yes, those are tab characters)
Mac OS 12
Here are log file snippets from a Mac OS 12.6 host.
log file /var/log/system
Sep 22 00:01:21 Mac-mini syslogd[108]: ASL Sender Statistics
Sep 22 00:06:22 Mac-mini halt[78568]: SHUTDOWN_TIME: 1695341182 412865
Oct 11 15:04:44 localhost bootlog[0]: BOOT_TIME 1697036684 0
Oct 11 15:04:55 localhost syslogd[110]: Configuration Notice:
ASL Module "com.apple.cdscheduler" claims selected messages.
Those messages may not appear in standard system log files or in the ASL database.
log file /var/log/wifi
Thu Sep 21 23:05:35.850 Usb Host Notification NOT activated
Fri Sep 22 15:05:01.994 Usb Host Notification activated!
log file /var/log/fsck_hs.log
/dev/rdisk2s2: fsck_hfs started at Thu Sep 21 21:31:05 2023
/dev/rdisk2s2: /dev/rdisk2s2: ** /dev/rdisk2s2 (NO WRITE)
/dev/rdisk2s2: Executing fsck_hfs (version hfs-583.100.10).
QUICKCHECK ONLY; FILESYSTEM CLEAN
/dev/rdisk2s2: fsck_hfs completed at Thu Sep 21 21:31:05 2023
log file /var/log/anka.log
Fri Sep 22 00:06:05 UTC 2023: Checking /Library/Developer/CoreSimulator/Volumes/watchOS_20S75...
Fri Sep 22 00:06:05 UTC 2023: No updates found
log file /var/log/displaypolicyd.log
2023-09-15 04:26:56.329071-0700: Version: V6.5.7 starting [0]
2023-09-15 04:26:56.330256-0700: Started at Fri Sep 15 04:26:56 2023
2023-09-15 04:27:17.163031-0700: exit at user request Fri Sep 15 04:27:17 2023
log file /var/log/com.apple.xpc.launchd/launchd.log.1
2023-10-26 16:56:23.171019 <Notice>: Doing boot task: restore-datapartition
2023-10-26 16:56:23.271137 <Notice>: Doing boot task: enable-swap
2023-10-26 16:56:23.287770 <Notice>: swap enabled
log file /var/log/asl/logs/aslmanager.20231026T170200+00
Oct 26 17:02:00: aslmanager starting
Oct 26 17:02:00: Processing data store /var/log/asl
Oct 26 17:02:00: Data Store Size = 325176
Did you also notice how the log file names differ in unexpected ways?
Microsoft Windows 10
Here is a snippet from a Windows 10 host, log file
${env:SystemRoot}\debug\mrt.log
Microsoft Windows Malicious Software Removal Tool v5.83, (build 5.83.13532.1)
Started On Thu Sep 10 10:08:35 2020
And a snippet from the same Windows 10 host, log file
${env:SystemRoot}\comsetup.log
COM+[12:24:34]: ********************************************************************************
COM+[12:24:34]: Setup started - [DATE:05,27,2020 TIME: 12:24 pm]
COM+[12:24:34]: ********************************************************************************
And a snippet from the same Windows 10 host, log file
${env:SystemRoot}\DirectX.log
11/01/19 20:03:40: infinst: Installed file C:\WINDOWS\system32\xactengine2_1.dll
And a snippet from the same Windows 10 host, log file
${env:SystemRoot}/Microsoft.NET/Framework/v4.0.30319/ngen.log
09/15/2022 14:13:22.951 [515]: 1>Warning: System.IO.FileNotFoundException: Could not load file or assembly
And a snippet from the same Windows 10 host, log file
${env:SystemRoot}/Performance/WinSAT/winsat.log
68902359 (21103) - exe\logging.cpp:0841: --- START 2022\5\17 14:26:09 PM ---
68902359 (21103) - exe\main.cpp:4363: WinSAT registry node is created or present
(yes, it reads hour 14
, and PM
β¦ π)
Summary
This chaotic logging approach is typical of commercial and open-source software, AND IT'S A MESS! Attempting to merge log messages by their natural sort mechanism, a datetime stamp, is difficult to impossible.
Hence the need for Super Speedy Syslog Searcher! π¦Έ
Stargazers
Further Reading
Dependencies
~22β37MB
~613K SLoC