### 4 releases (2 breaking)

0.5.0 | Aug 10, 2024 |
---|---|

0.4.1 | Mar 13, 2023 |

0.4.0 | Jan 23, 2023 |

0.3.1 | Dec 16, 2022 |

#**163** in Machine learning

**26** downloads per month

**Custom license**

105KB

2K
SLoC

*Probabilistic* Principal Component Analysis (PPCA) model

This project implements a PPCA model implemented in Rust for Python using

and `pyO3`

.`maturin`

## Installing

This package is available in PyPI!

`pip`` install ppca-rs`

And you can also use it natively in Rust:

`cargo`` add ppca`

## Why use PPCA?

Glad you asked!

- The PPCA is a simples extension of the PCA (principal component analysis), but can be overall more robust to train.
- The PPCA is a
*proper statistical model*. It doesn't spit out only the mean. You get standard deviations, covariances, and all the goodies that come from thre realm of probability and statistics. - The PPCA model can handle
*missing values*. If there is data missing from your dataset, it can extrapolate it with reasonable values and even give you a confidence interval. - The training converges quickly and will always tend to a global maxima. No metaparameters to dabble with and no local maxima.

## Why use `ppca-rs`

?

`ppca-rs`

That's an easy one!

- It's written in Rust, with only a bit of Python glue on top. You can expect a performance in the same leage as of C code.
- It uses

to paralellize computations evenly across as many CPUs as you have.`rayon` - It also uses fancy Linear Algebra Trickery Technology to reduce computational complexity in key bottlenecks.
- Battle-tested at Vio.com with some ridiculously huge datasets.

## Quick example

`import` `numpy` `as` `np`
`from`` ppca_rs ``import`` ``Dataset``,` `PPCATrainer``,` `PPCA`
`samples``:` `np``.``ndarray`
`#` Create your dataset from a rank 2 np.ndarray, where each line is a sample.
# Use non-finite values (`inf`s and `nan`) to signal masked values
dataset = Dataset(samples)
# Train the model (convenient edition!):
model: PPCAModel = PPCATrainer(dataset).train(state_size=10, n_iters=10)
# And now, here is a free sample of what you can do:
# Extrapolates the missing values with the most probable values:
extrapolated: Dataset = model.extrapolate(dataset)
# Smooths (removes noise from) samples and fills in missing values:
extrapolated: Dataset = model.filter_extrapolate(dataset)
# ... go back to numpy:
eextrapolated_np = extrapolated.numpy()

## Juicy extras!

- Tired of the linear? We have support for PPCA mixture models. Make the most of your data with clustering and dimensionality reduction in a single tool!
- Support for adaptation of DataFrames using either

or`pandas`

. Never juggle those`polars`

s in your code again.`df`

## Building from soure

### Prerequisites

You will need Rust, which can be installed locally (i.e., without

) and you will also need `sudo`

, which can be installed by`maturin`

`pip`` install maturin`

is also a good idea if you are going to mess around with it locally. At least, you need a `pipenv`

set, otherwise, `venv`

will complain with you.`maturin`

### Installing it locally

Check the

for the available commands (or just type `Makefile`

). To install it locally, do`make`

`make`` install ``#`` optional: i=python.version (e.g, `i=3.9`)
`

### Messing around and testing

To mess around, *inside a virtual environment* (a

is provided for the `Pipfile`

lovers), do`pipenv`

`maturin`` develop ``#`` use the flag --release to unlock superspeed!
`

This will install the package locally *as is* from source.

## How do I use this stuff?

See the examples in the

folder. Also, all functions are type hinted and commented. If you are using `examples`

or `pylance`

, it should be easy to navigate.`mypy`

## Is it faster than the pure Python implemetation you made?

You bet!

#### Dependencies

~7.5MB

~147K SLoC