#low-latency #data-flow #composable #distributed #recording #node #dora

app dora-record

dora goal is to be a low latency, composable, and distributed data flow

17 releases

0.3.5 Jul 3, 2024
0.3.5-rc0 Jun 26, 2024
0.3.4 May 20, 2024
0.3.3 Apr 11, 2024
0.3.0 Nov 3, 2023

#2 in #data-flow

Download history 3/week @ 2024-03-28 77/week @ 2024-04-04 142/week @ 2024-04-11 282/week @ 2024-05-16 35/week @ 2024-05-23 1/week @ 2024-05-30 1/week @ 2024-06-06 47/week @ 2024-06-20 163/week @ 2024-06-27 98/week @ 2024-07-04 4/week @ 2024-07-11

312 downloads per month

Apache-2.0

21KB
159 lines

dora-record

dora data recording using Apache Arrow Parquet.

This nodes is still experimental.

Getting Started

cargo install dora-record --locked

Adding to existing graph:

- id: dora-record
  custom:
    source: dora-record
    inputs:
      image: webcam/image
      text: webcam/text
      # You can add any input and it is going to be logged.

Output Files

Format: Parquet file

path: out/<DATAFLOW_ID>/<INPUT>.parquet

Columns:

  • trace_id: String, representing the id of the current trace
  • span_id: String, representing the unique span id
  • timestamp_uhlc: u64, representing the timestamp in Unique Hybrid Logical Clock time
  • timestamp_utc: DataType::Timestamp(Milliseconds), representing the timestamp in Coordinated Universal Time.
  • <INPUT> : Column containing the input in its defined format.

Example:

{
  "trace_id": "2fd23ddf1b5d2aa38ddb86ceedb55928",
  "span_id": "15aef03e0f052bbf",
  "timestamp_uhlc": "7368873278370007008",
  "timestamp_utc": 1715699508406,
  "random": [1886295351360621740]
}

merging multiple file

We can merge input files using the trace_id that is going to be shared when using opentelemetry features.

  • trace_id can also be queried from UI such as jaeger UI, influxDB and so on...
  • trace_id keep tracks of the logical flow of data, compared to timestamp based merging that might not reflect the actual logical flow of data.

Dependencies

~40–72MB
~1.5M SLoC