#object-store #amazon-s3 #google-cloud #local #object-storage #azure #cloud-storage

yanked object_store-fork

A generic object store interface for uniformly interacting with AWS S3, Google Cloud Storage, Azure Blob Storage and local files

0.5.0 Sep 8, 2022

#42 in #object-store

MIT/Apache

365KB
7.5K SLoC

Rust Object Store


lib.rs:

object_store

This crate provides a uniform API for interacting with object storage services and local files via the the ObjectStore trait.

Create an ObjectStore implementation:

Adapters

ObjectStore instances can be composed with various adapters which add additional functionality:

Listing objects:

Use the ObjectStore::list method to iterate over objects in remote storage or files in the local filesystem:


use std::sync::Arc;
use object_store::{path::Path, ObjectStore};
use futures::stream::StreamExt;

// create an ObjectStore
let object_store: Arc<dyn ObjectStore> = Arc::new(get_object_store());

// Recursively list all files below the 'data' path.
// 1. On AWS S3 this would be the 'data/' prefix
// 2. On a local filesystem, this would be the 'data' directory
let prefix: Path = "data".try_into().unwrap();

// Get an `async` stream of Metadata objects:
 let list_stream = object_store
     .list(Some(&prefix))
     .await
     .expect("Error listing files");

 // Print a line about each object based on its metadata
 // using for_each from `StreamExt` trait.
 list_stream
     .for_each(move |meta|  {
         async {
             let meta = meta.expect("Error listing");
             println!("Name: {}, size: {}", meta.location, meta.size);
         }
     })
     .await;

Which will print out something like the following:

Name: data/file01.parquet, size: 112832
Name: data/file02.parquet, size: 143119
Name: data/child/file03.parquet, size: 100
...

Fetching objects

Use the ObjectStore::get method to fetch the data bytes from remote storage or files in the local filesystem as a stream.


use std::sync::Arc;
use object_store::{path::Path, ObjectStore};
use futures::stream::StreamExt;

// create an ObjectStore
let object_store: Arc<dyn ObjectStore> = Arc::new(get_object_store());

// Retrieve a specific file
let path: Path = "data/file01.parquet".try_into().unwrap();

// fetch the bytes from object store
let stream = object_store
    .get(&path)
    .await
    .unwrap()
    .into_stream();

// Count the '0's using `map` from `StreamExt` trait
let num_zeros = stream
    .map(|bytes| {
        let bytes = bytes.unwrap();
       bytes.iter().filter(|b| **b == 0).count()
    })
    .collect::<Vec<usize>>()
    .await
    .into_iter()
    .sum::<usize>();

println!("Num zeros in {} is {}", path, num_zeros);

Which will print out something like the following:

Num zeros in data/file01.parquet is 657

Dependencies

~7–23MB
~359K SLoC