5 releases
0.2.3 | Sep 25, 2024 |
---|---|
0.2.2 | Sep 12, 2024 |
0.2.1 | Sep 5, 2024 |
0.2.0 | Aug 25, 2024 |
0.1.0 | Aug 24, 2024 |
#644 in Machine learning
44 downloads per month
25KB
532 lines
Candle LSTM
Re-implementing Candle LSTM inference to speed inference up, including bidirectional LSTM.
This implementation is ONLY FOR CPU INFERENCE. DO NOT USE IT ON METAL OR CUDA.
Metal and Cuda
I test inference on My Macbook Pro with M2 chip. It is ~5ms slower than Candle on Metal.
I test On RTX4090 with Cuda 12.5. It is ~6x slower than Candle on Cuda.
Test Data
Install Pytorch and run simple.py to generate test data.
- lstm_test.pt: Pytorch LSTM with batch_first = False.
- lstm_test_batch_first.pt: Pytorch LSTM with batch_first = True.
- bi_lstm_test.pt: Pytorch Bidirectional LSTM with batch_first = False.
- bi_lstm_test_batch_first.pt: Pytorch Bidirectional LSTM with batch_first = True.
Dependencies
~10MB
~208K SLoC