#pytorch #lstm #candle

candle-lstm

optimize HuggingFace candle LSTM in some cases

5 releases

0.2.3 Sep 25, 2024
0.2.2 Sep 12, 2024
0.2.1 Sep 5, 2024
0.2.0 Aug 25, 2024
0.1.0 Aug 24, 2024

#566 in Machine learning

Download history 281/week @ 2024-08-22 18/week @ 2024-08-29 157/week @ 2024-09-05 116/week @ 2024-09-12 121/week @ 2024-09-19 60/week @ 2024-09-26 9/week @ 2024-10-03 6/week @ 2024-10-10

200 downloads per month

Custom license

25KB
532 lines

Candle LSTM

Re-implementing Candle LSTM inference to speed inference up, including bidirectional LSTM.

This implementation is ONLY FOR CPU INFERENCE. DO NOT USE IT ON METAL OR CUDA.

Metal and Cuda

I test inference on My Macbook Pro with M2 chip. It is ~5ms slower than Candle on Metal.

I test On RTX4090 with Cuda 12.5. It is ~6x slower than Candle on Cuda.

Test Data

Install Pytorch and run simple.py to generate test data.

  1. lstm_test.pt: Pytorch LSTM with batch_first = False.
  2. lstm_test_batch_first.pt: Pytorch LSTM with batch_first = True.
  3. bi_lstm_test.pt: Pytorch Bidirectional LSTM with batch_first = False.
  4. bi_lstm_test_batch_first.pt: Pytorch Bidirectional LSTM with batch_first = True.

Dependencies

~11MB
~214K SLoC