Cargo Features

[dependencies]
minarrow = { version = "0.4.1", default-features = false, features = ["parallel_proc", "c_ffi_tests", "extended_categorical", "extended_numeric_types", "cube", "scalar_type", "value_type", "chunked", "large_string", "views", "matrix", "zstd", "snappy", "cast_arrow", "cast_polars", "datetime", "simd", "datetime_ops", "str_arithmetic", "fast_hash", "broadcast", "size", "select", "regex"] }
parallel_proc = rayon

Adds parallel iterators via Rayon

c_ffi_tests = cc

Adds roundtrip FFI tests. Leave off if you don't need it in your build pipeline, as it's mostly C-code.

extended_categorical = extended_numeric_types

Adds Categorical8, Categorical16, and Categorical64.

Highly recommend keeping these off unless required E.g., constrained or embedded environments, as they add combinatorial weight to the binary and enum match arms

extended_numeric_types extended_categorical?

Adds UInt8, UInt16, Int8, Int16 types.

Highly recommend keeping these off unless required E.g., constrained or embedded environments, as they add combinatorial weight to the binary and enum match arms.

For most analytical use cases, they get upcasted anyway.

cube

Adds a cube object for stacking tables on an extra axis Useful for time series, and group analytics

Affects array::broadcast_array_to_cube, cube::broadcast_cube_to_array, cube::broadcast_fieldarray_to_cube, cube::broadcast_cube_to_fieldarray, cube::broadcast_table_to_cube, cube::broadcast_cube_to_table, broadcast::cube, minarrow::structs.cube, aliases::CubeV, cube::broadcast_cube_to_scalar, cube::broadcast_arrayview_to_cube, cube::broadcast_cube_to_arrayview, cube::broadcast_numericarrayview_to_cube, cube::broadcast_cube_to_numericarrayview, cube::broadcast_textarrayview_to_cube, cube::broadcast_cube_to_textarrayview, cube::broadcast_tableview_to_cube, cube::broadcast_superarray_to_cube, cube::broadcast_cube_to_superarray, cube::broadcast_supertable_to_cube

scalar_type default

Adds a unified scalar type, that's useful for Array aggregations, and other use cases where you end up with one value. However, it is one of several downcasting methods available in Rust, and when predominantly working with numbers, one might prefer using my_function::<i32>() semantics which addresses the type immediately, e.g., in conjunction with T: Numeric, T: Integer or T:Float generic functions, rather than getting a Scalar object make that then needs .i32() style access, or a manual match. It is a pain that Rust can't just get the value when it's wrapped in such cases, but this is an inherent type safety limitation.

Affects scalar::Scalar, array::broadcast_array_to_scalar, scalar::broadcast_scalar_to_table, scalar::broadcast_scalar_to_array, scalar::broadcast_scalar_to_tuple2, scalar::broadcast_scalar_to_tuple3, scalar::broadcast_scalar_to_tuple4, scalar::broadcast_scalar_to_tuple5, scalar::broadcast_scalar_to_tuple6, scalar::broadcast_scalar_to_fieldarray, scalar::broadcast_fieldarray_to_scalar, table::broadcast_table_to_scalar, arithmetic::scalar_arithmetic, minarrow::enums.scalar, cube::broadcast_cube_to_scalar, matrix::broadcast_matrix_scalar_add, matrix::broadcast_scalar_matrix_add, broadcast::broadcast_value, scalar::broadcast_scalar_to_tableview, scalar::broadcast_scalar_to_superarray

value_type

Adds a unified value enum, that can be used for engine-level orchestration or any situation where a catch-all, unified encompassing type is required to satisfy the compiler. It includes roundtrip From and TryFrom for each inner type so that signatures do not need to couple to it directly. Recommend leaving off if you don't need it.

Affects minarrow::enums.value, matrix::broadcast_matrix_array_add, matrix::broadcast_array_matrix_add, broadcast::broadcast_value

chunked default

ChunkedArray and ChunkedTable objects that support iterating over multiple inner objects of the same type, for memory-mapped streaming etc.

Affects aliases::ChunkedTable, array::broadcast_array_to_supertable, super_array::broadcast_superarray_to_table, super_array::route_super_array_broadcast, super_table::broadcast_super_table_with_operator, super_table::broadcast_supertable_to_array, super_table::broadcast_fieldarray_to_supertable, super_table::broadcast_supertable_to_fieldarray, super_table::broadcast_superarray_to_supertable, super_table::broadcast_supertable_to_superarray, table::broadcast_super_table_add, table::broadcast_table_to_superarray, minarrow::structs.chunked, utils::create_aligned_chunks_from_array, array::broadcast_array_to_supertableview, cube::broadcast_superarray_to_cube, cube::broadcast_cube_to_superarray, cube::broadcast_supertable_to_cube, cube::broadcast_cube_to_supertable, field_array::broadcast_fieldarray_to_superarrayview

large_string default cast_polars?

Int64-based string

views default

Provides windowed collection views for Numeric, String, and Temporal types. Often, everything can be done with only the ArrayView abstraction, or, the ArrayViewT (&Array, Offset, Length) tuple from aliases. These are for the cases where they fall short, e.g., you have numeric or text specific functions, and want to streamline type management. In those cases, these abstractions provide the equivalent of Into<NumericArrayView> for several types, and accept both the original and windowed view variants. Therefore, one can unify numeric entry points through here enabling a flexible API, at the cost of more surface complexity.

Affects array_view::broadcast_arrayview_to_table, array_view::broadcast_arrayview_to_tableview, array_view::broadcast_arrayview_to_supertableview, super_table_view::broadcast_supertableview_to_arrayview, table::broadcast_table_to_arrayview, table_view::broadcast_tableview_to_tableview, table_view::broadcast_tableview_to_arrayview, minarrow::kernels.routing, minarrow::views.collections, minarrow::views.array_view, minarrow::views.table_view, minarrow::traits.view, aliases::CubeV, array::broadcast_array_to_supertableview, cube::broadcast_arrayview_to_cube, cube::broadcast_cube_to_arrayview, cube::broadcast_numericarrayview_to_cube, cube::broadcast_cube_to_numericarrayview, cube::broadcast_textarrayview_to_cube, cube::broadcast_cube_to_textarrayview

matrix

Adds a 2D matrix that uses a flat buffer in the format compatible with BLAS and LAPACK Fortan and C kernels. Includes TryFrom conversion methods so it's easy to move from Table column selections into the matrix, without worrying too much about buffers and strides. Hence, if you are working only with matrices, you may want this from the get-go. If you are working predominantly with Tabular data but running PCA's and SVD's (for e.g.), you can keep your data in Table format and any functions that accept Matrix should also work for your Table, with a small once-off performance penalty of cloning the columns into a contiguous buffer, that becomes noticeable with large data sizes.

Affects matrix::broadcast_matrix_add, broadcast::matrix, minarrow::structs.matrix, matrix::broadcast_matrix_scalar_add, matrix::broadcast_scalar_matrix_add, matrix::broadcast_matrix_array_add, matrix::broadcast_array_matrix_add

zstd

Adds the zstd compression option for Parquet and IPC formats Zstd offers a higher compression ratio but is slightly slower.

Enables zstd

snappy

Adds the snappy compression option for Parquet and IPC format.
Snappy is lightweight, minimal, but with less compression than zstd.

Enables snappy

cast_arrow = arrow, arrow-schema

Adds to_apache_arrow() for casting into that library.

cast_polars = large_string, polars, polars-arrow

Adds to_polars() for casting into that library.

datetime default datetime_ops?

Adds Datetime array types.

Affects aliases::DatetimeAVT, aliases::DtArr, minarrow::collections.temporal_array, minarrow::variants.datetime, minarrow::collections.temporal_array_view, datetime::DatetimeArray, scalar::broadcast_scalar_to_temporal_arrayview, scalar::broadcast_temporal_arrayview_to_scalar, cube::broadcast_temporalarrayview_to_cube, cube::broadcast_cube_to_temporalarrayview, super_table::broadcast_temporalarrayview_to_supertable, super_table::broadcast_supertable_to_temporal_arrayview

simd default

Adds SIMD for the Bitmask and Arithmetic kernels
A much more extensive set of kernels is available under the downstream simd-kernels crate.

Affects arithmetic::simd, bitmask::simd

datetime_ops = datetime

Adds full datetime functionality with the time crate including:

  • Human-readable datetime conversions
  • Timezone-aware operations
  • Date/time arithmetic (add/subtract durations, dates)
  • Comparison operations
  • Component extraction (year, month, day, hour, etc.) At, the expense of an external dependency.

Without this feature, datetime values are raw integer offsets. The ArrowType stored in Field and/or FieldArray specifies the logical type (Date32, Date64, Timestamp, etc.) for Arrow FFI compatibility.

Enables phf ^0.11 and time

Affects tz::TimezoneInfo, tz::TZ_DATABASE, tz::ABBR_TO_OFFSET, tz::lookup_timezone

str_arithmetic = memchr, ryu

Adds string arithmetic kernels
Includes (small) external dependencies, and supports str concatenation with floats for the arithmetic kernels e.g., "Hello" + 1.0 = "Hello1", etc.
Also overloads std::ops::Add, Mul, Sub, Div, Pow
with best-case String equivalents (e.g., '+' concatenates),
for type unification rather than panicking.

Affects string::apply_str_float, string::apply_str_str, string::apply_dict32_str, string::apply_str_dict32, string::apply_dict32_num, string::format_finite

fast_hash

Replaces all hashmaps and hashsets used for count distinct operations and categorical dictionary interning with the faster ahash.

Enables ahash

broadcast

Adds typed arithmetic broadcasting for add, sub, mult, div, rem

Affects arithmetic::types, minarrow::kernels.broadcast

size

Adds byte size trait for best-effort size calculation

Affects minarrow::traits.byte_size

select

Adds pandas-style selection for Table and TableV with .c() and .r() methods

Affects minarrow::traits.selection, array_view::ArrayV.active_data_selection, table_view::TableV.active_col_selection, table_view::TableV.active_row_selection

default = chunked, datetime, large_string, scalar_type, simd, views

These default features are set whenever minarrow is added without default-features = false somewhere in the dependency tree.

Features from optional dependencies

In crates that don't use the dep: syntax, optional dependencies automatically become Cargo features. These features may have been created by mistake, and this functionality may be removed in the future.

All of the below external dependencies do not need to be enabled directly See [features] for the relevant feature that enables them.

arrow cast_arrow?

Enables arrow ^55.2.0

Arrow and Polars are for optional to/from_apache_arrow() and to/from_polars() via the optional cast_arrow and cast_polars features.

arrow-schema cast_arrow?

Enables arrow-schema ^55.2.0

polars cast_polars?

Enables polars ^0.50.0

polars-arrow cast_polars?

Enables polars-arrow ^0.50.0

rayon parallel_proc?
ryu str_arithmetic?
memchr str_arithmetic?
regex implicit feature

Affects string::regex_str_str, string::regex_dict_str, string::regex_str_dict, string::regex_dict_dict

cc build c_ffi_tests?