9 releases (4 breaking)

0.5.0 Oct 10, 2024
0.4.0 Sep 19, 2024
0.3.2 Jun 24, 2024
0.3.1 May 24, 2024
0.1.0 Jan 31, 2024

#165 in Database interfaces

Download history 544/week @ 2024-07-29 457/week @ 2024-08-05 765/week @ 2024-08-12 751/week @ 2024-08-19 528/week @ 2024-08-26 655/week @ 2024-09-02 500/week @ 2024-09-09 733/week @ 2024-09-16 1200/week @ 2024-09-23 359/week @ 2024-09-30 962/week @ 2024-10-07 1133/week @ 2024-10-14 721/week @ 2024-10-21 1640/week @ 2024-10-28 1266/week @ 2024-11-04 488/week @ 2024-11-11

4,135 downloads per month

Apache-2.0

66KB
1K SLoC

JavaScript UDF for Apache Arrow

Crate Docs

Usage

Add the following lines to your Cargo.toml:

[dependencies]
arrow-udf-js = "0.3"

Create a Runtime and define your JS functions in string form. Note that the function must be exported and its name must match the one you pass to add_function.

use arrow_udf_js::{Runtime, CallMode};

let mut runtime = Runtime::new().unwrap();
runtime
    .add_function(
        "gcd",
        arrow_schema::DataType::Int32,
        CallMode::ReturnNullOnNullInput,
        r#"
        export function gcd(a, b) {
            while (b != 0) {
                let t = b;
                b = a % b;
                a = t;
            }
            return a;
        }
        "#,
    )
    .unwrap();

You can then call the JS function on a RecordBatch:

let input: RecordBatch = ...;
let output: RecordBatch = runtime.call("gcd", &input).unwrap();

If you print the input and output batch, it will be like this:

 input     output
+----+----+-----+
| a  | b  | gcd |
+----+----+-----+
| 15 | 25 | 5   |
|    | 1  |     |
+----+----+-----+

For set-returning functions (or so-called table functions), define the function as a generator:

use arrow_udf_js::{Runtime, CallMode};

let mut runtime = Runtime::new().unwrap();
runtime
    .add_function(
        "range",
        arrow_schema::DataType::Int32,
        CallMode::ReturnNullOnNullInput,
        r#"
        export function* range(n) {
            for (let i = 0; i < n; i++) {
                yield i;
            }
        }
        "#,
    )
    .unwrap();

You can then call the table function via call_table_function:

let chunk_size = 1024;
let input: RecordBatch = ...;
let outputs = runtime.call_table_function("range", &input, chunk_size).unwrap();
for result in outputs {
    let output: RecordBatch = result?;
    // do something with the output
}

If you print the output batch, it will be like this:

+-----+-------+
| row | range |
+-----+-------+
| 0   | 0     |
| 2   | 0     |
| 2   | 1     |
| 2   | 2     |
+-----+-------+

The JS code will be run in an embedded QuickJS interpreter.

See the example for more details.

Type Mapping

The following table shows the type mapping between Arrow and JavaScript:

Arrow Type JS Type
Null null
Boolean boolean
Int8 number
Int16 number
Int32 number
Int64 number
UInt8 number
UInt16 number
UInt32 number
UInt64 number
Float32 number
Float64 number
String string
LargeString string
Date32 Date
Timestamp Date
Decimal128 BigDecimal
Decimal256 BigDecimal
Binary Uint8Array
LargeBinary Uint8Array
List(Int8) Int8Array
List(Int16) Int16Array
List(Int32) Int32Array
List(Int64) BigInt64Array
List(UInt8) Uint8Array
List(UInt16) Uint16Array
List(UInt32) Uint32Array
List(UInt64) BigUint64Array
List(Float32) Float32Array
List(Float64) Float64Array
List(others) Array
Struct object

This crate also supports the following Arrow extension types:

Extension Type Physical Type ARROW:extension:name JS Type
JSON String, Binary, LargeBinary arrowudf.json any (parsed by JSON.parse(string))
Decimal String arrowudf.decimal BigDecimal

Dependencies

~11–16MB
~269K SLoC