#metrics #ner #ml #nlp #seq-eval

rusev

Fast implementation of SeqEval, a sequence evaluation framework

1 unstable release

0.1.0 Dec 27, 2024

#440 in Math

Download history 129/week @ 2024-12-27

129 downloads per month

Custom license

160KB
4K SLoC

This library is a re-implementation of the SeqEval library. It is built with a focus on performance and soudness.

SCHEMES

The current schemes are supported:

  • IOB1: Here, I is a token inside a chunk, O is a token outside a chunk and B is the beginning of the chunk immediately following another chunk of the same named entity.
  • IOB2: It is same as IOB1, except that a B tag is given for every token, which exists at the beginning of the chunk.
  • IOE1: An E tag used to mark the last token of a chunk immediately preceding another chunk of the same named entity.
  • IOE2: It is same as IOE1, except that an E tag is given for every token, which exists at the end of the chunk.
  • BILOU/IOBES: 'E' and 'L' denotes Last or Ending character in a sequence and 'S' denotes a single element and 'U' a unit element.

The BILOU and IOBES schemes are only supported in strict mode.

More information about schemes

Terminology

This library partially reuses the terminology of the SeqEval library. The concepts might not be mapped one to one.

  • A class is an entity we are interested in, such as 'LOC' for location, 'PER' for person, 'GEO' for geography, etc. It can be anything.
  • A token is a string containing a class, such a GEO, LOC, PER and a prefix. The prefix indicates where we are in the current chunk. For a given scheme, the list of possible prefix are the letters of the scheme, such as I-O-B or I-O-E. Prefix can only be a single ascii character.
  • A chunk is list of at least one token associated with a named entity.
  • A Scheme gives us enough information to parse a list of tokens into a chunk.

Dependencies

~10MB
~184K SLoC