Lib.rs

›

#snowflake #data-science #connector #sql-query #cases #snowflake-client

light-snowflake-connector

Lightweight wrapper around Snowflake's REST API

2 releases

0.1.1	Apr 2, 2024
0.1.0	Jan 1, 2024

#686 in Database interfaces

59 downloads per month

MIT/Apache

50KB
819 lines

Light Snowflake Connector

Minimal wrapper around Snowflake's public REST API.

Easy to use
Intended for small, tactical queries and point lookups,
- Particularly for dashboards and other interactive use cases
- Keep in mind that Snowflake latency is usually >100ms so design carefully
Designed not to break for large result sets, but not optimized for them either

Usage

Add following line to Cargo.toml:

# Cargo.toml
light-snowflake-connector = "0.1.0"

use light_snowflake_connector::{Cell, SnowflakeClient, SnowflakeError};
use light_snowflake_connector::jwt_simple::algorithms::RS256KeyPair;

#[tokio::main]
async fn main() -> Result<(), SnowflakeError> {
    let key_pair = RS256KeyPair::generate(2048)?;
    let config = SnowflakeClient {
        key_pair,
        account: "ACCOUNT".into(),
        user: "USER".into(),
        database: "DB".into(),
        warehouse: "WH".into(),
        role: Some("ROLE".into()),
    };

    let result = config
        .prepare("SELECT * FROM TEST_TABLE WHERE id = ? AND name = ?")
        .add_binding(10)
        .add_binding("Henry")
        .query()
        .await?;

    // Get the first partition of the result, and assert that there is only one partition
    let partition = result.only_partition()?;
    
    // Get the results as a Vec<Vec<Cell>>, which is a tagged enum similar to serde_json::Value
    let cells = partition.cells();
    match &cells[0][0] {
        Cell::Int(x) => println!("Got an integer: {}", x),
        Cell::Varchar(x) => println!("Got a string: {}", x),
        _ => panic!("Got something else"),
    }

    // Get the results as a Vec<Vec<serde_json::Value>>, which is a list of lists of JSON values
    let json_table = partition.json_table();

    // Get the results as a Vec<serde_json::Value>, which is a list of JSON objects
    let json_objects = partition.json_objects();

    Ok(())
}

Features & Limitations

Authentication:

Key Pair Authentication
OAuth, SSO: This is possible but not implemented yet
Username/Password: Not available in Snowflake's REST API 2.0

Querying:

Prepared Statements with qmark "?" Bindings
- No other bindings are supported
Rust async support (but synchronous from Snowflake's point of view)
Snowflake "async" support (for super long running queries)
GET and PUT: not supported by Snowflake's REST API 2.0
Arrow support: we're trying to keep the dependency tree small
Streaming support, and multiple batches

Types:

String, str
i128
f64
bool
Date, Time, Timestamp_Ntz (NaiveDateTime), Timestamp_Ltz (DateTime; not well testes, not sure about the use cases)
Timestamp_Tz (DateTime)
Decimal (dec and rust_decimal have different semantics and precision)

Implicit Type Conversions

Snowflake's NUMBER type is 128 bit (38 decimal digits) but supports a scale as well. There's no native Rust type that can achieve both of these so we opted for the more convenient (and probably common) use cases:

Integers are converted to i128, which is lossless
Floats are converted to f64, which is lossy
Which to do is determined on a cell by cell basis, so you can have a column with mixed types

This particular workaround could be improved by using a Decimal type, but there are some issues with the available libraries:

rust_decimal is a pure Rust implementation, but it doesn't support 128 bit numbers
dec supports 128 bit numbers, but somehow 4 digits less decimal precision. Also, it's a wrapper around a C library, so it could cause issues downstream for WASM users (e.g. FaaS)
arrow (and FWIW, arrow2) supports 128 bit numbers, but it's a huge dependency and we'd have to pivot to columnar data structures and a different API.

Multiple Batches

This library supports multiple batches, which is useful for streaming large result sets. But the results are transferred as JSON, so if high throughput is a concern, you should consider one of the Arrow based libraries instead, like snowflake-api.

Relationship to other Snowflake Connectors

This is a fork of the snowflake-connector library, and differs in a few ways:

It returns Cells rather than deserializing the data into custom structs
It does not support Decimal types (because it is hard to do correctly)

It differs from snowflake-api in that:

It uses only the documented v2 API, which (presently) does not support GET or PUT
It doesn't use any deprecated or undocumented APIs at all
It doesn't require Arrow (but it also doesn't get the performance benefits of Arrow)

It differs from most other languages' Snowflake connectors in that:

It doesn't use Arrow
It doesn't use the deprecated v1 API
It doesn't use undocumented APIs
It doesn't support GET or PUT
It doesn't support Async Queries
It's not affiliated with or supported by Snowflake.
It's not an official product of any company, doesn't have any guarantees, warranties, or support.

Dependencies

~12–25MB
~362K SLoC