#pandas #streaming #data-frame

bin+lib kvc

Very simple key-value-count tools to go from / to pandas data frames or streaming formats

11 releases (1 stable)

1.0.0 May 12, 2021
0.5.6 May 12, 2021
0.5.5 Apr 21, 2021
0.4.0 Mar 31, 2021
0.1.0 Mar 27, 2021

#651 in Text processing

Download history 2/week @ 2022-11-28 5/week @ 2022-12-05 17/week @ 2022-12-12 4/week @ 2022-12-19 6/week @ 2022-12-26 3/week @ 2023-01-02 14/week @ 2023-01-09 4/week @ 2023-01-16 9/week @ 2023-01-23 36/week @ 2023-01-30 20/week @ 2023-02-06 37/week @ 2023-02-13 38/week @ 2023-02-20 1/week @ 2023-02-27 5/week @ 2023-03-06 3/week @ 2023-03-13

60 downloads per month
Used in kda-tools

MIT license

650KB
380 lines

This crate / package is a rust module that handles streaming input and output. It's purpose is to tally or accumulate values for a streaming set of keys. It is designed to be stupidly simple and consume / produce whitespace seperate values.

Key Value Counts

I use this library to parse simple journal-like logs where each line is of the form:

2021-03-01 warnings:3 error ... (other items with optional counts)

Supposing I wanted to do some processing on this data. This is a very readable / writeable format, but is not standard.

We can use kvc-stream to covert it into something more lika a stream of k-value pairs or kvc-df to convert it to a pandas dataframe

Spec

The kvc journal format is very simple.

  • Each line is a "frame"
  • A frame has an optional "date header"
  • A frame is composed of a string of whitespace-seperated keys with optional counts per key
  • A '#' ends the frame, and is useful for comments

These are valid frames, one per line:

a
event event
2021-04-01 april_fools_pranks:4
2021-03-01 key another_key a-third-key <weird-symbols_ar_ok!> this_has_occured_three_times:3 this_twice this_twice
2021-04-02 # Nothing happened that day

Suppose that's stored in data.txt. (try it!)

Running <data.txt kvc-stream produces:

1 a 1
2 event 2
3 Date 2021-04-01
3 april_fools_pranks 4
4 Date 2021-03-01
4 <weird-symbols_ar_ok!> 1
4 a-third-key 1
4 this_has_occured_three_times 3
4 this_twice 2
4 key 1
4 another_key 1
5 Date 2021-04-02

Running cat data.txt | kvc-df (or < data.txt kvc-df ) produces:

Idx  april_fools_pranks  this_twice  a    <weird-symbols_ar_ok!>  event  Date        a-third-key  key  this_has_occured_three_times  another_key
1    N/A                 N/A         1    N/A                     N/A    N/A         N/A          N/A  N/A                           N/A
2    N/A                 N/A         N/A  N/A                     2      N/A         N/A          N/A  N/A                           N/A
3    4                   N/A         N/A  N/A                     N/A    2021-04-01  N/A          N/A  N/A                           N/A
4    N/A                 2           N/A  1                       N/A    2021-03-01  1            1    3                             1
5    N/A                 N/A         N/A  N/A                     N/A    2021-04-02  N/A          N/A  N/A                           N/A

OK, so I actually aligned the text with cat data.txt | kvc-dv | column -t

I use this to keep a journal of events and easily scrape it for analysis in other programs or databases.

Dependencies

~2.5MB
~55K SLoC