### 12 releases

0.0.12 | Feb 9, 2023 |
---|---|

0.0.11 | Jul 6, 2022 |

0.0.10 | Oct 18, 2021 |

0.0.9 | Jul 18, 2021 |

0.0.1 | Mar 20, 2020 |

#**125** in Math

**59** downloads per month

**MIT**license

125KB

2K
SLoC

# ndarray-glm

Rust library for solving linear, logistic, and generalized linear models through
iteratively reweighted least squares, using the

module.`ndarray-linalg`

## Status

This package is in alpha and the interface could undergo changes. Even the return value of certain functions may change from one release to the next. Correctness is not guaranteed.

The regression algorithm uses iteratively re-weighted least squares (IRLS) with a step-halving procedure applied when the next iteration of guesses does not increase the likelihood.

Suggestions (via issues) and pull requests are welcome.

## Prerequisites

The recommended approach is to use a system BLAS implementation. For instance, to install OpenBLAS on Debian/Ubuntu:

`sudo`` apt update` `&&` `sudo`` apt install`` -`y libopenblas-dev

Then use this crate with the

feature.`openblas-system`

To use an alternative backend or to build a static BLAS implementation, refer to the

documentation. Use
this crate with the appropriate feature flag and it will be forwarded to
`ndarray-linalg`

.`ndarray-linalg`

## Example

To use in your crate, add the following to the

:`Cargo.toml`

`ndarray ``=` `{` version `=` `"`0.15`"``,` features `=` `[``"`blas`"``]``}`
ndarray`-`glm `=` `{` version `=` `"`0.0.12`"``,` features `=` `[``"`openblas-system`"``]` `}`

An example for linear regression is shown below.

`use` `ndarray_glm``::``{`array`,` Linear`,` ModelBuilder`,` `utility``::`standardize`}``;`
`//` define some test data
`let` data_y `=` `array!``[``0.``3``,` `1.``3``,` `0.``7``]``;`
`let` data_x `=` `array!``[``[``0.``1``,` `0.``2``]``,` `[``-``0.``4``,` `0.``1``]``,` `[``0.``2``,` `0.``4``]``]``;`
`//` The design matrix can optionally be standardized, where the mean of each independent
`//` variable is subtracted and each is then divided by the standard deviation of that variable.
`let` data_x `=` `standardize``(`data_x`)``;`
`let` model `=` `ModelBuilder``::``<`Linear`>``::`data`(``&`data_y`,` `&`data_x`)``.``build``(``)``?``;`
`//` L2 (ridge) regularization can be applied with l2_reg().
`let` fit `=` model`.``fit_options``(``)``.``l2_reg``(`1e`-``5``)``.``fit``(``)``?``;`
`//` Currently the result is a simple array of the MLE estimators, including the intercept term.
`println!``(``"`Fit result: `{}``"``,` fit`.`result`)``;`

Custom non-canonical link functions can be defined by the user, although the
interface is currently not particularly ergonomic. See

for examples.`tests/custom_link.rs`

## Features

- Linear regression
- Logistic regression
- Generalized linear model IRLS
- Linear offsets
- Generic over floating point type
- Non-float domain types
- Regularization
- L2 (ridge)
- L1 (lasso)
- Elastic Net

- Other exponential family distributions
- Poisson
- Binomial
- Exponential
- Gamma
- Inverse Gaussian

- Data standardization/normalization
- External utility function
- Automatic internal transformation

- Weighted (and correlated?) regressions
- Non-canonical link functions
- Goodness-of-fit tests

## Troubleshooting

Lasso/L1 regularization can converge slowly in some cases, particularly when the data is poorly-behaved, seperable, etc.

The following tips are recommended things to try if facing convergence issues generally, but are more likely to be necessary in a L1 regularization problem.

- Standardize the feature data
- Use f32 instead of f64
- Increase the tolerance and/or the maximum number of iterations
- Include a small L2 regularization as well.

If you encounter problems that persist even after these techniques are applied, please file an issue so the algorithm can be improved.

## References

- notes on generalized linear models
- Generalized Linear Models and Extensions by Hardin & Hilbe

#### Dependencies

~**71MB**

~892K SLoC