#python #arrow #udf #apache-arrow

arrow-udf-python

Python runtime for Arrow UDFs

1 unstable release

new 0.1.0 Apr 25, 2024

#11 in #udf

Apache-2.0

36KB
685 lines

Python UDF for Apache Arrow

Crate Docs

Notice: Python 3.12 is required to run this library. If python3 is not 3.12, please set the environment variable PYO3_PYTHON=python3.12.

Add the following lines to your Cargo.toml:

[dependencies]
arrow-udf-python = "0.1"

Create a Runtime and define your Python functions in string form. Note that the function name must match the one you pass to add_function.

use arrow_udf_python::{CallMode, Runtime};

let mut runtime = Runtime::new().unwrap();
let python_code = r#"
def gcd(a: int, b: int) -> int:
    while b:
        a, b = b, a % b
    return a
"#;
let return_type = arrow_schema::DataType::Int32;
let mode = CallMode::ReturnNullOnNullInput;
runtime.add_function("gcd", return_type, mode, python_code).unwrap();

You can then call the python function on a RecordBatch:

let input: RecordBatch = ...;
let output: RecordBatch = runtime.call("gcd", &input).unwrap();

The python code will be run in an embedded CPython 3.12 interpreter, powered by PyO3.

See the example for more details.

Dependencies

~12–18MB
~244K SLoC