1 unstable release
new 0.1.0 | Apr 25, 2024 |
---|
#11 in #udf
36KB
685 lines
Python UDF for Apache Arrow
Notice: Python 3.12 is required to run this library.
If python3
is not 3.12, please set the environment variable PYO3_PYTHON=python3.12
.
Add the following lines to your Cargo.toml
:
[dependencies]
arrow-udf-python = "0.1"
Create a Runtime
and define your Python functions in string form.
Note that the function name must match the one you pass to add_function
.
use arrow_udf_python::{CallMode, Runtime};
let mut runtime = Runtime::new().unwrap();
let python_code = r#"
def gcd(a: int, b: int) -> int:
while b:
a, b = b, a % b
return a
"#;
let return_type = arrow_schema::DataType::Int32;
let mode = CallMode::ReturnNullOnNullInput;
runtime.add_function("gcd", return_type, mode, python_code).unwrap();
You can then call the python function on a RecordBatch
:
let input: RecordBatch = ...;
let output: RecordBatch = runtime.call("gcd", &input).unwrap();
The python code will be run in an embedded CPython 3.12 interpreter, powered by PyO3.
See the example for more details.
Dependencies
~12–18MB
~244K SLoC