rebalancedcv.RebalancedLeavePOut

class rebalancedcv.RebalancedLeavePOut(p)


Description

Rebalanced Leave-P-Out cross-validator.

Provides train/test indices to split data in train/test sets with subsampling within the training set to ensure that all training folds have identical class balances. This cross-validation tests on all distinct samples of size p, while a remaining n - 2p samples form the training set in each iteration, with an additional p samples used to subsamples from within the training set.

This class is designed to have the same functionality and implementation structure as scikit-learn’s LeavePOut()

Note: Similarly to what was previously mentioned in scikit-learn’s documentation, RebalancedLeavePOut(p) is NOT equivalent to RebalancedKFold(n_splits=n_samples // p) which creates non-overlapping test sets. Due to the high number of iterations which grows combinatorically with the number of samples this cross-validation method can be very costly.

At least 1+p observations per class are needed for RebalancedLeavePOut.

Parameters

  • p : int
    • Number of points to be left out in each testing fold

Example

### Observing the indices on a small example dataset
import numpy as np
np.random.seed(1)
from rebalancedcv import RebalancedLeavePOut
X = np.array([[1, 2, 1, 2, 1, 2], [3, 4, 3, 4, 3, 4]]).T
y = np.array([0,1,0,1,0,1])
rloo = RebalancedLeavePOut(p=2)
for i, (train_index, test_index) in enumerate(rloo.split(X, y)):
    print(f"Fold {i}:")
    print(f"  Train: index={train_index}")
    print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[4 5]
  Test:  index=[0 1]
Fold 1:
  Train: index=[1 4]
  Test:  index=[0 2]
Fold 2:
  Train: index=[4 5]
  Test:  index=[0 3]
Fold 3:
  Train: index=[2 3]
  Test:  index=[0 4]
Fold 4:
  Train: index=[1 2]
  Test:  index=[0 5]
Fold 5:
  Train: index=[3 4]
  Test:  index=[1 2]
Fold 6:
  Train: index=[2 5]
  Test:  index=[1 3]
Fold 7:
  Train: index=[0 5]
  Test:  index=[1 4]
Fold 8:
  Train: index=[3 4]
  Test:  index=[1 5]
Fold 9:
  Train: index=[0 5]
  Test:  index=[2 3]
Fold 10:
  Train: index=[0 5]
  Test:  index=[2 4]
Fold 11:
  Train: index=[1 4]
  Test:  index=[2 5]
Fold 12:
  Train: index=[0 1]
  Test:  index=[3 4]
Fold 13:
  Train: index=[1 2]
  Test:  index=[3 5]
Fold 14:
  Train: index=[0 1]
  Test:  index=[4 5]


Methods

The methods of the RebalancedLeavePOut class are designed to enable identical funcitonality to scikit-learn’s LeavePOut.

get_n_splits(X, y, groups=None)
  • Returns the number of splitting iterations in the cross-validator.

    • Parameters
      • X : array-like of shape (n_samples, n_features)
        • Training data, where n_samples is the number of samples and n_features is the number of features.
      • y : array-like of shape (n_samples, )
        • The target vector relative to X.
      • groups : object
        • Always ignored, exists for compatibility.

    • Returns
      • n_splits : int
        • Returns the number of splitting iterations in the cross-validator.
split(X, y, groups=None, seed=None)
  • Generate indices to split data into training and test set.

    • Parameters
      • X : array-like of shape (n_samples, n_features)
        • Training data, where n_samples is the number of samples and n_features is the number of features.
      • y : array-like of shape (n_samples,)
        • The target variable for supervised learning problems.
      • groups : array-like of shape (n_samples,), default=None
        • Group labels for the samples used while splitting the dataset into train/test set.
      • seed : int, default = None
        • If provided, is used to set a seed in subsampling

    • Yields
      • train : ndarray
        • The training set indices for that split.
      • test : ndarray
        • The testing set indices for that split.
See also:
RebalancedKFold
       Stratified K-fold iterator with training set rebalancing
RebalancedLeaveOneOut
       Leave-one-out iterator with training set rebalancing

For more background on LeavePOut, refer to the scikit-learn User Guide.