#bag #hamming #words #bow #lsh #array

no-std hamming-bow

Produces binary term frequency bit arrays for hamming-space bag of words

1 unstable release

0.1.0 Jul 21, 2021

#6 in #lsh

MIT license

7KB
57 lines

hamming-bow

Discord Crates.io MIT/Apache docs.rs LoC Tests Lints no_std

Produces binary term frequency bit arrays for hamming-space bag of words

How it works

This works by using hamming-dict to create codewords in the hamming space that are as maximally spaced out as possible.

For each input key, its nearest neighbor is found in the dictionary and the corresponding bit is set in the bag. If the number of bits set in the bag becomes sufficiently large, the threshold number of word occurences required to set a bit will increase to balance the hash.

Dependencies

~340–580KB
~11K SLoC