#sorting #scores #binary #classification #weight #order #label

bin+lib scors

Scores for binary classifier evaluation

6 releases

new 0.1.8 Mar 10, 2025
0.1.7 Mar 9, 2025

#10 in #scores

Download history 218/week @ 2025-02-25 290/week @ 2025-03-04

508 downloads per month

MIT license

30KB
587 lines

Build Status Crates.io Version PyPI - Version

Scors

This package is a Rust re-implementation with Python bindings of some of the classification scores from scikit-learn (sklearn), restricted to binary classification only. Scores generally have 3 input parameters for labels, predictions, and weights, with slightly different names in sklearn:

sklearn scors
y_true labels
y_score predictions
sample_weight weights

Functions in scors have an additional parameter order that can be

  1. (default) None to indicate unsorted data,
  2. Order.ASCENDING to indicate that the input data is sorted in ascending order wrt predictions, or
  3. Order.DESCENDING to indicate that the input data is sorted in descending order wrt predictions.

Other parameters that may be present (e.g. max_fprs in roc_auc) follow the naming and meaning as defined in the respective sklearn counterpart

Why?

I want to improve runtime performance of scores for my use case. I have a single large background sample that I combine and score with each of many small foreground smaples. For the rank-based metrics (e.g. average_precision-score_), the data is sorted, which has complexity n*log(n). Exploiting the structure of my data helps me avoid this cost to boost performance. But even without assumptions about structure in the data, I found ways to improve performance. This is a summary of all the optimizations I implemented (or plan to):

  1. Add option to assume the data is sorted. This allows the caller to exploit structure in data that is already sorted/mostly sorted.
  2. Remove checks on labels. When stepping through the debugger, I noticed that the sklearn imlementation uses safeguards like np.unique to check the validity of the data. This can be helpful to ensure that assumptions are always met, especially in a library a huge audience and general scope like sklearn. But it also has a performance penalty. I decided, to place the responsibility for data validation completely on the caller. The caller can add or leave out data validation as appropriate
  3. Minimize data allocation:
    1. All current scores are implemented as single pass over the data (double pass in case of ROC AUC with max fprs). 2 For ordered input data, no allocations are made.
    2. If the optional weights parameter is not provided, no extra constant array filled with 1 is created. Instead, the Rust implementation uses a constant value iterator.
    3. TODO: For unordered input data, I currently create sorted copies of all of the input data. That is a total of 4 (3 if weights are not provided) extra allocations for the index array (for sorting), labels, predictions, and weights. Instead of creating copies of the input arrays, I consider creating index views that simply index into the original arrays through the sorted index array. This may provide another performance benefit, but I still have to benchmark this.

Currently Implemented Scores

sklearn scors
average_precision_score average_precision
roc_auc_score roc_auc

Is It Actually Faster?

TODO: Benchmarks

Dependencies

~5.5MB
~113K SLoC