2 releases
0.1.2 | Jul 4, 2024 |
---|---|
0.1.0 | Dec 1, 2021 |
#223 in Development tools
30KB
635 lines
Disco CLI
🔥 Generate recommendations from CSV files
- Supports user-based and item-based recommendations
- Works with explicit and implicit feedback
- Uses high-performance matrix factorization
Also available for Rust and Ruby
Installation
Download the latest version:
You can also install it with Homebrew:
brew install ankane/brew/disco
or Cargo:
cargo install disco-cli
Quickstart
Download the MovieLens 100k dataset and generate item-based recommendations
disco download movielens-100k
disco item-recs movielens-100k.csv output.csv --factors 20
grep "^Star Wars" output.csv
How to Use
Data
Create a CSV file with your data. If users rate items directly, this is known as explicit feedback. The CSV should have three columns: user_id
, item_id
, and rating
.
user_id,item_id,rating
1,post1,5
1,post2,3.5
2,post1,4
If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Use value
instead of rating
and a value like number of purchases, number of page views, or just 1
.
user_id,item_id,value
1,post1,1
1,post2,1
2,post1,1
Each user_id
/item_id
combination should only appear once.
User-based Recommendations
Generate user-based recommendations - “users like you also liked”
disco user-recs data.csv output.csv
This creates a CSV with user_id
, recommended_item_id
, and score
columns.
Item-based Recommendations
Generate item-based recommendations - “users who liked this item also liked”
disco item-recs data.csv output.csv
This creates a CSV with item_id
, recommended_item_id
, and score
columns.
Similar Users
Generate similar users
disco similar-users data.csv output.csv
This creates a CSV with user_id
, similar_user_id
, and score
columns.
Algorithms
Disco uses high-performance matrix factorization.
- For explicit feedback, it uses the stochastic gradient method with twin learners
- For implicit feedback, it uses the conjugate gradient method
Specify the number of factors and iterations
disco ... --factors 8 --iterations 20
Options
Specify the number of recommendations for each user or item
disco ... --count 10
Datasets
Download a dataset
disco download movielens-100k
Supported datasets are:
- movielens-100k
- movielens-1m
- movielens-25m
- movielens-latest-small
- movielens-latest
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/disco-cli.git
cd disco-cli
cargo run
Dependencies
~9–19MB
~271K SLoC