#machine-learning #ai

nightly perpetual

A self-generalizing gradient boosting machine which doesn't need hyperparameter optimization

45 releases (7 breaking)

0.7.10 Dec 2, 2024
0.7.8 Nov 28, 2024
0.2.0 Jul 17, 2024

#95 in Machine learning

38 downloads per month

Custom license

355KB
8K SLoC

Python Versions PyPI Version Crates.io Version Static Badge PyPI - Downloads

Perpetual

PerpetualBooster is a gradient boosting machine (GBM) algorithm which doesn't need hyperparameter optimization unlike other GBM algorithms. Similar to AutoML libraries, it has a budget parameter. Increasing the budget parameter increases the predictive power of the algorithm and gives better results on unseen data. Start with a small budget (e.g. 1.0) and increase it (e.g. 2.0) once you are confident with your features. If you don't see any improvement with further increasing the budget, it means that you are already extracting the most predictive power out of your data.

Benchmark

Hyperparameter optimization usually takes 100 iterations with plain GBM algorithms. PerpetualBooster achieves the same accuracy in a single run. Thus, it achieves up to 100x speed-up at the same accuracy with different budget levels and with different datasets.

The following table summarizes the results for the California Housing dataset (regression):

Perpetual budget LightGBM n_estimators Perpetual mse LightGBM mse Speed-up wall time Speed-up cpu time
1.0 100 0.192 0.192 54x 56x
1.5 300 0.188 0.188 59x 58x
2.1 1000 0.185 0.186 42x 41x

The following table summarizes the results for the Cover Types dataset (classification):

Perpetual budget LightGBM n_estimators Perpetual log loss LightGBM log loss Speed-up wall time Speed-up cpu time
0.9 100 0.091 0.084 72x 78x

The results can be reproduced using the scripts in the examples folder.

PerpetualBooster is a GBM but behaves like AutoML so it is benchmarked also against AutoGluon (v1.2, best quality preset), the current leader in AutoML benchmark. Top 10 datasets with the most number of rows are selected from OpenML datasets. The results are summarized in the following table for regression tasks:

OpenML Task Perpetual Training Duration Perpetual Inference Duration Perpetual RMSE AutoGluon Training Duration AutoGluon Inference Duration AutoGluon RMSE
Airlines_DepDelay_10M 518 11.3 29.0 520 30.9 28.8
bates_regr_100 3421 15.1 1.084 OOM OOM OOM
BNG(libras_move) 1956 4.2 2.51 1922 97.6 2.53
BNG(satellite_image) 334 1.6 0.731 337 10.0 0.721
COMET_MC 44 1.0 0.0615 47 5.0 0.0662
friedman1 275 4.2 1.047 278 5.1 1.487
poker 38 0.6 0.256 41 1.2 0.722
subset_higgs 868 10.6 0.420 870 24.5 0.421
BNG(autoHorse) 107 1.1 19.0 107 3.2 20.5
BNG(pbc) 48 0.6 836.5 51 0.2 957.1
average 465 3.9 - 464 19.7 -

PerpetualBooster outperformed AutoGluon on 8 out of 10 datasets, training equally fast and inferring 5x faster. The results can be reproduced using the automlbenchmark fork here.

Usage

You can use the algorithm like in the example below. Check examples folders for both Rust and Python.

from perpetual import PerpetualBooster

model = PerpetualBooster(objective="SquaredLoss")
model.fit(X, y, budget=1.0)

Documentation

Documentation for the Python API can be found here and for the Rust API here.

Installation

The package can be installed directly from pypi:

pip install perpetual

Using conda-forge:

conda install conda-forge::perpetual

To use in a Rust project and to get the package from crates.io:

cargo add perpetual

Contribution

Contributions are welcome. Check CONTRIBUTING.md for the guideline.

Paper

PerpetualBooster prevents overfitting with a generalization algorithm. The paper is work-in-progress to explain how the algorithm works. Check our blog post for a high level introduction to the algorithm.

Dependencies

~3–4MB
~87K SLoC