rebalancedcv.RebalancedLeaveOneOut

class rebalancedcv.RebalancedLeaveOneOut()

Description

Rebalanced Leave-One-Out cross-validator.

Provides train/test indices to split data in train/test sets. Each sample is used once as a test set (singleton) while the remaining samples are used to form the training set, with subsampling to ensure identical class balances for all training sets across all splits.

This class is designed to have the same functionality and implementation structure as scikit-learn’s LeaveOneOut()

At least two observations per class are needed for RebalancedLeaveOneOut.

Examples

### Observing the indices on a small example dataset
import numpy as np
np.random.seed(1)
from rebalancedcv import RebalancedLeaveOneOut
X = np.array([[1, 2, 1, 2], [3, 4, 3, 4]]).T
y = np.array([1, 2, 1, 2])
rloo = RebalancedLeaveOneOut()
for i, (train_index, test_index) in enumerate(rloo.split(X, y)):
    print(f"Fold {i}:")
    print(f"  Train: index={train_index}")
    print(f"  Test:  index={test_index}")

Fold 0:
  Train: index=[1 2]
  Test:  index=[0]
Fold 1:
  Train: index=[0 3]
  Test:  index=[1]
Fold 2:
  Train: index=[0 3]
  Test:  index=[2]
Fold 3:
  Train: index=[1 2]
  Test:  index=[3]

### Implementing a LogisticRegressionCV evaluation on randomly generated data
### using RebalancedLeaveOneOut
import numpy as np 
from sklearn.linear_model import LogisticRegressionCV
from rebalancedcv import RebalancedLeaveOneOut
from sklearn.metrics import roc_auc_score

## given some random `X` matrix, and a `y` binary vector
X = np.random.rand(100, 10)
y = np.random.rand(100) > 0.5

## Rebalanced leave-one-out evaluation
rloo = RebalancedLeaveOneOut()
rloocv_predictions = [ LogisticRegressionCV()\
                                .fit(X[train_index], y[train_index])\
                                .predict_proba(X[test_index]
                                            )[:, 1][0]
                      for train_index, test_index in rloo.split(X,y) 
                     ]

## Since all the data is random, a fair evaluation 
## should yield au auROC close to 0.5
print('Rebalanceed Leave-one-out auROC: {:.2f}'\
              .format(  roc_auc_score(y, rloocv_predictions) ) )

Rebalanceed Leave-one-out auROC: 0.49

Methods

The methods of the RebalancedLeaveOneOut class are designed to enable identical funcitonality to scikit-learn’s LeaveOneOut.

get_n_splits(X, y, groups=None)

Returns the number of splitting iterations in the cross-validator.
- Parameters
  - X : array-like of shape (n_samples, n_features)
    - Training data, where n_samples is the number of samples and n_features is the number of features.
  - y : array-like of shape (n_samples, )
    - The target vector relative to X.
  - groups : object
    - Always ignored, exists for compatibility.
- Returns
  - n_splits : int
    - Returns the number of splitting iterations in the cross-validator.

split(X, y, groups=None, seed=None)

Generate indices to split data into training and test set.
- Parameters
  - X : array-like of shape (n_samples, n_features)
    - Training data, where n_samples is the number of samples and n_features is the number of features.
  - y : array-like of shape (n_samples,)
    - the target variable for supervised learning problems.
  - groups : array-like of shape (n_samples,), default=None
    - Group labels for the samples used while splitting the dataset into train/test set.
  - seed : int, default = None
    - If provided, is used to set a seed in subsampling
- Yields
  - train : ndarray
    - The training set indices for that split.
  - test : ndarray
    - The testing set indices for that split.

See also:
RebalancedKFold
Stratified K-fold iterator with training set rebalancing
RebalancedLeavePOut
Leave-P-out iterator with training set rebalancing

For more background on LeaveOneOut, refer to the scikit-learn User Guide.