1 unstable release
0.1.0 | Dec 27, 2024 |
---|
#440 in Math
129 downloads per month
160KB
4K
SLoC
This library is a re-implementation of the SeqEval library. It is built with a focus on performance and soudness.
SCHEMES
The current schemes are supported:
- IOB1: Here,
I
is a token inside a chunk,O
is a token outside a chunk andB
is the beginning of the chunk immediately following another chunk of the same named entity. - IOB2: It is same as IOB1, except that a
B
tag is given for every token, which exists at the beginning of the chunk. - IOE1: An
E
tag used to mark the last token of a chunk immediately preceding another chunk of the same named entity. - IOE2: It is same as IOE1, except that an
E
tag is given for every token, which exists at the end of the chunk. - BILOU/IOBES: 'E' and 'L' denotes
Last
orEnding
character in a sequence and 'S' denotes a single element and 'U' a unit element.
The BILOU and IOBES schemes are only supported in strict mode.
More information about schemes
Terminology
This library partially reuses the terminology of the SeqEval library. The concepts might not be mapped one to one.
- A class is an entity we are interested in, such as 'LOC' for location, 'PER' for person, 'GEO' for geography, etc. It can be anything.
- A token is a string containing a class, such a
GEO
,LOC
,PER
and a prefix. The prefix indicates where we are in the current chunk. For a given scheme, the list of possible prefix are the letters of the scheme, such as I-O-B or I-O-E. Prefix can only be a single ascii character. - A chunk is list of at least one token associated with a named entity.
- A Scheme gives us enough information to parse a list of tokens into a chunk.
Dependencies
~10MB
~184K SLoC