#index-file #file-format #lucene #exchange #search-engine #challenge #pisa

bin+lib ciff

The inverted index exchange format as defined as part of the Open-Source IR Replicability Challenge (OSIRRC) initiative

6 releases

0.3.1 Aug 9, 2022
0.3.0 Mar 15, 2022
0.2.1 Mar 7, 2022
0.1.1 Apr 21, 2020

#1183 in Parser implementations

Download history 4/week @ 2024-02-19 4/week @ 2024-02-26 5/week @ 2024-03-11 79/week @ 2024-04-01

84 downloads per month

Apache-2.0

68KB
1.5K SLoC

Common Index File Format (CIFF)

Rust License crates.io API

What is CIFF?

Common Index File Format CIFF is an inverted index exchange format as defined as part of the Open-Source IR Replicability Challenge (OSIRRC) initiative. The primary idea is to allow indexes to be dumped from Lucene via Anserini which can then be ingested by other search engines. This repository contains the necessary code to read the CIFF into a format which PISA can use for building (and then searching) indexes.

Versions

We currently provide a Rust binary for converting CIFF data to a PISA canonical index, and for converting a PISA canonical index back to CIFF. This means PISA can generate indexes that can then be consumed by other systems that support CIFF (and vice versa).

Install from AUR

The package is available in Arch User Repository. If you are on an Arch-based system, you can install it by running the following:

# Replace yay with the helper of your choice.
yay -S ciff-pisa

Install from crates.io

Note that the installation methods described below are not system-wide. For example, on Linux the tools usually end up in $HOME/.cargo/bin directory. To use tools from command line, make sure to use the absolute path or update your PATH variable to include the $HOME/.cargo/bin directory.

The library and the tools are also available in crates.io, so you can install the binaries in your local repository by running:

cargo install ciff

Install from source

Build locally

Just run cargo build --release to build the binaries.

To convert a CIFF blob to a PISA canonical: ./target/release/ciff2pisa

To convert a PISA canonical to a CIFF blob: ./target/release/pisa2ciff

Install

You can also install the binaries to your local cargo repository:

cargo install --path .

or if you are installing the same version again:

cargo install --path . --force

Use as Cargo dependency

If you are insterested in using the library components in your own Rust library, you can simply defeine it as a dependency in your Cargo.toml file:

[dependencies]
ciff = "0.1"

Library API documentation

The API documentation is available on docs.rs.

Dependencies

~8–20MB
~270K SLoC