#hdfs #lance-db #object-store #lance

lance-hdfs-provider

HDFS store provider for lance

2 unstable releases

Uses new Rust 2024

new 0.2.0 Feb 12, 2026
0.1.0 Jan 8, 2026

#4 in #hdfs

MIT license

49KB
75 lines

lance-hdfs-provider

HDFS store provider for Lance built on top of the OpenDAL hdfs service. It lets Lance and LanceDB read and write datasets directly to Hadoop HDFS.

Installation

Add the crate in your Cargo.toml:

[dependencies]
lance-hdfs-provider = "0.1.0"

Quickstart: Lance dataset

Register the provider, then read or write using HDFS URIs:

use std::sync::Arc;
use lance::{io::ObjectStoreRegistry, session::Session,
    dataset::{DEFAULT_INDEX_CACHE_SIZE, DEFAULT_METADATA_CACHE_SIZE}
};
use lance::dataset::builder::DatasetBuilder;
use lance_hdfs_provider::HdfsStoreProvider;

# #[tokio::main(flavor = "current_thread")]
# async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut registry = ObjectStoreRegistry::default();
    registry.insert("hdfs", Arc::new(HdfsStoreProvider));

    let session = Arc::new(Session::new(
        DEFAULT_INDEX_CACHE_SIZE,
        DEFAULT_METADATA_CACHE_SIZE,
        Arc::new(registry),
    ));

    let uri = "hdfs://127.0.0.1:9000/sample-dataset";

    // Load an existing dataset
    let _dataset = DatasetBuilder::from_uri(uri)
        .with_session(session.clone())
        .load()
        .await?;

    // Or write a new dataset (see examples)
    Ok(())
# }

Quickstart: LanceDB

Use the same registry when creating the LanceDB session:

use std::sync::Arc;
use lance::{io::ObjectStoreRegistry, session::Session,
    dataset::{DEFAULT_INDEX_CACHE_SIZE, DEFAULT_METADATA_CACHE_SIZE}
};
use lance_hdfs_provider::HdfsStoreProvider;

# #[tokio::main(flavor = "current_thread")]
# async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut registry = ObjectStoreRegistry::default();
    registry.insert("hdfs", Arc::new(HdfsStoreProvider));

    let session = Arc::new(Session::new(
        DEFAULT_INDEX_CACHE_SIZE,
        DEFAULT_METADATA_CACHE_SIZE,
        Arc::new(registry),
    ));

    let db = lancedb::connect("hdfs://127.0.0.1:9000/test-db")
        .session(session.clone())
        .execute()
        .await?;

    let table = db.open_table("table1").execute().await?;
    Ok(())
# }

Notes

  • Ensure your HDFS URI includes the NameNode. It can be a server with host and port (e.g. hdfs://127.0.0.1:9000/path), or a named cluster.
  • Authentication and additional options can be passed via Lance StorageOptions; any key supported by OpenDAL's HDFS service can be provided.

Licenses

Licensed under either of

Dependencies

~165MB
~2.5M SLoC