18 releases

0.0.1-alpha.20 Feb 24, 2024
0.0.1-alpha.18 Mar 12, 2023
0.0.1-alpha.13 Feb 26, 2023

#1024 in Text processing

Custom license

60KB
1K SLoC

Quickner Core

This is where the core of the Quickner project is located. The rust code is located in the src directory. The src directory contains the following:

  • config.rs - The configuration file parser and validator
  • models.rs - The data models used in the project
  • utils.rs - The utility functions used in the project

Building

To build the project, you need to have Rust installed. You can install Rust by following the instructions here. Once you have Rust installed, you can build the project by running the following command:

cargo build --release

License

This project is licensed under the Mozilla Public License 2.0. See the LICENSE file for details.


lib.rs:

quickner is a library for NER annotation that prodives a CLI and a Python API. It comes with a default configuration file that can be modified to fit your needs.

Batch Annotation

You can use quickner to annotate a batch of texts.

Provide a configuration file and a folder containing your texts:

  • a csv file containing the texts you want to annotate.
  • a csv file containing the entities you want to annotate.
  • a csv file containing the excludes you want to exclude from the annotation.

Configuration

The configuration file is a toml file that contains the following fields:

[logging]
level = "info" # level of logging (debug, info, warning, error, fatal)

[texts]

[texts.input]
filter = false     # if true, only texts in the filter list will be used
path = "texts.csv" # path to the texts file

[texts.filters]
accept_special_characters = ".,-" # list of special characters to accept in the text (if special_characters is true)
alphanumeric = false              # if true, only strictly alphanumeric texts will be used
case_sensitive = false            # if true, case sensitive search will be used
max_length = 1024                 # maximum length of the text
min_length = 0                    # minimum length of the text
numbers = false                   # if true, texts with numbers will not be used
punctuation = false               # if true, texts with punctuation will not be used
special_characters = false        # if true, texts with special characters will not be used

[annotations]
format = "spacy" # format of the output file (jsonl, spaCy, brat, conll)

[annotations.output]
path = "annotations.jsonl" # path to the output file

[entities]

[entities.input]
filter = true         # if true, only entities in the filter list will be used
path = "entities.csv" # path to the entities file
save = true           # if true, the entities found will be saved in the output file

[entities.filters]
accept_special_characters = ".-" # list of special characters to accept in the entity (if special_characters is true)
alphanumeric = false             # if true, only strictly alphanumeric entities will be used
case_sensitive = false           # if true, case sensitive search will be used
max_length = 20                  # maximum length of the entity
min_length = 0                   # minimum length of the entity
numbers = false                  # if true, entities with numbers will not be used
punctuation = false              # if true, entities with punctuation will not be used
special_characters = true        # if true, entities with special characters will not be used

[entities.excludes]
# path = "excludes.csv" # path to entities to exclude from the search

Example

use quickner::models::Quickner;

let quick = Quickner::new("./config.toml");
let annotations = quick.process(true);

Single Annotation

You can also use quickner to annotate a single text. This is useful when you want to annotate a single text and then use the annotation in your code.

use quickner::Document;

let annotation = Document::from_string("Rust is maintained by Mozilla");
let entities = HashMap::new();
entities.insert("Rust", "Programming Language");
entities.insert("Mozilla", "Organization");
annotation.annotate(entities);

Dependencies

~8–20MB
~236K SLoC