#parser #clickhouse #log-parser #duck-db #cdn #csv #parquet

app ucloud-cdn-log-parser

Parse UCloud CDN log files to CSV

4 releases

0.1.3 Mar 20, 2024
0.1.2 Mar 19, 2024
0.1.1 Mar 19, 2024
0.1.0 Mar 16, 2024

#323 in Database interfaces

25 downloads per month

MIT license

8KB
111 lines

ucloud-cdn-log-parser

Parse ucloud cdn log to csv/tsv format with/without header, then you can use duckdb / clickhouse local / etc to analyze the log.

Install

Download from release page, which is built by github action.

or install via cargo or build from source.

cargo install ucloud-cdn-log-parser --locked

Usage

# download logs

# parse & convert to csv / parquet

zcat *.gz | \
  ucloud-cdn-log-parser > log.csv

## use duckdb
zcat *.gz | \
  ucloud-cdn-log-parser | \
  pv | \
  duckdb -c "
    copy 
      (select * from read_csv('/dev/stdin')) 
      to 'log.parquet.zst' 
      (format parquet, compression 'zstd');"

# now you can use duckdb / clickhouse local / etc to query the log

Analyze log with duckdb for example:

-- get top 100 client_ip by sent_bytes_incl_header in last 6 hours
select
    client_ip,
    format_bytes(sum(sent_bytes_incl_header)::bigint) sent_bytes,
    count(*) as n
from 'log.parquet.zst'
where date_time > (now() - interval 6 hours)::timestamp
group by client_ip
order by sum(sent_bytes_incl_header) desc
limit 100;

-- get top 100 request_method_url_protocol by sent_bytes_incl_header in last 6 hours
select
    request_method_url_protocol,
    format_bytes(sum(sent_bytes_incl_header)::bigint) sent_bytes,
    count(*) as n
from 'log.parquet.zst'
where date_time > (now() - interval 6 hours)::timestamp
group by request_method_url_protocol
order by sum(sent_bytes_incl_header) desc, n desc
limit 100;

Reference

Dependencies

~4.5–6.5MB
~103K SLoC