#information-retrieval #protocols #computation #database #query #pir

simplepir

A fast and efficient implementation of SimplePIR in Rust

2 stable releases

1.0.1 Aug 19, 2024

#542 in Cryptography

MIT license

720KB
42K SLoC

simplepir-rs

A fast and efficient implementation of SimplePIR in Rust.

Background

Private information retrieval (PIR) is a protocol for accessing information from a database hosted on a server without the server knowing anything about the information that was accessed, including the index of the information within the database. SimplePIR is a fast and efficient implementation of PIR that provides square-root communication costs and linear computation costs. To learn more about SimplePIR, check out this paper by Alexandra Henzinger, Matthew M. Hong, Henry Corrigan-Gibbs, Sarah Meiklejohn, and Vinod Vaikuntanathan.

Getting Started

We'll start by specifying some basic parameters for the SimplePIR scheme. For good security and performance, a secret-key dimension (the length of the encryption key) of 2,048 is recommended. We'll also specify the plaintext modulus, which tells us the range of numbers that can be accurately accessed and decrypted. In this example, we'll use 2^17.

use simplepir::{Matrix, Database};
let secret_dimension = 2048;
let mod_power = 17;
let plaintext_mod = 2_u64.pow(mod_power);

We'll then create a simple database to store our data. Databases can be created at random or from an existing Matrix. This crate provides simple Matrix and Vector types for convenience.

let matrix = Matrix::from_data(
    vec![
        vec![1, 2, 3, 4],
        vec![5, 6, 7, 8],
        vec![9, 10, 11, 12],
        vec![13, 14, 15, 16],
    ]
);
let db = Database::from_matrix(matrix, mod_power);

To increase performance while also decreasing memory consumption, the database can be compressed by packing three data records (numbers) into a single record.

let compressed_db = db.compress();

Now for the fun parts!

There are four main functions of the SimplePIR protocol:

The first function runs during the "offline" phase.

setup()

Takes the database as input and outputs a hint for the client and for the server. This is called by the server separately and prior to the other functions. It's very computationally heavy, but massively speeds up the "online" portion of the protocol.

let (server_hint, client_hint) = setup(&compressed_db, secret_dimension);

The next three functions run during the "online" phase.

query()

Takes an index into the database and outputs an encrypted query. This is called by the client.

let index = 0;
let (client_state, query_cipher) = query(index, 4, secret_dimension, server_hint, plaintext_mod);

The client_state variable just stores some basic information about the client's query.

answer()

Takes the matrix-vector product between the encrypted query and the entire database and outputs an encrypted answer vector. This is called by server and is the most computationally intense part of the online phase.

let answer_cipher = answer(&compressed_db, &query_cipher);

recover()

Takes the encrypted answer vector, decrypts it, and returns the desired record.

let record = recover(&client_state, &client_hint, &answer_cipher, &query_cipher, plaintext_mod);

Now if we did everything right, this assert shouldn't fail!

assert_eq!(database.get(index).unwrap(), record);

But is it fast?

Yup.

SimplePIR is a very efficient PIR protocol. The answer() function, the most performance critical part of the online phase has a linear time-complexity and runs at memory bandwidth speeds on measured hardware. The query() and recover() functions also run very fast.

Benchmarks

The following benchmarks were recorded on a Lenovo Thinkpad X1 Carbon 6th Gen with an Intel Core i7-8650U @ 4.2 GHz with 16GB of RAM running Manjaro Linux. Obviously, these will vary considerably depending on the hardware you use. The database size used was 3600×3600 (about 104 MB) and the secret-key dimension was 2,048.

Function Time Throughput
setup() 22.8 s 7 MB/s
query() 56.0 ms N/A
answer() 4.8 ms 21.6 GB/s
recover() 1.4 μs 2.5 GB/s

Recorded memory bandwidth was around 10-12 GB/s. The answer() function was actually able to exceed memory bandwidth by around 2x thanks to the efficient packing implementation discussed earlier.

Roadmap

As lovely as this library is, there's definitely room for improvement. I'm not sure if I'll have time to add in new features. If you feel inclined to implement a new feature, feel free to make a pull request!

  • Support for u16, u32, and u128
  • Implementing the packing optimization in setup()
  • GPU support

Dependencies

~2–2.8MB
~51K SLoC