rebalancedcv.RebalancedLeaveOneOutRegression

class rebalancedcv.RebalancedLeaveOneOutRegression()


Description

Rebalanced Leave-One-Out cross-validator for regression.

Provides train/test indices to split data in train/test sets. Each sample is used once as a test set (singleton) while the remaining samples are used to form the training set, with subsampling to ensure similar label balances for all training sets across all splits.

This class is designed to have the same functionality and implementation structure as scikit-learn’s LeaveOneOut()

At least three observations are needed for RebalancedLeaveOneOutRegression.

Examples

### Observing the indices on a small example dataset
import numpy as np
np.random.seed(1)
from rebalancedcv import RebalancedLeaveOneOutRegression
X = np.array([[1, 2, 1, 2], [3, 4, 3, 4]]).T
y = np.array([1.9, 2.2, 2.4, 2.5])
rloo = RebalancedLeaveOneOutRegression()
for i, (train_index, test_index) in enumerate(rloo.split(X, y)):
    print(f"Fold {i}:")
    print(f"  Train: index={train_index}")
    print(f"  Test:  index={test_index}")
Fold 0:
  Train: index=[1 3]
  Test:  index=[0]
Fold 1:
  Train: index=[2 3]
  Test:  index=[1]
Fold 2:
  Train: index=[1 3]
  Test:  index=[2]
Fold 3:
  Train: index=[0 2]
  Test:  index=[3]

Methods

The methods of the RebalancedLeaveOneOutRegression class are designed to enable identical funcitonality to scikit-learn’s LeaveOneOut.

get_n_splits(X, y, groups=None)
  • Returns the number of splitting iterations in the cross-validator.

    • Parameters
      • X : array-like of shape (n_samples, n_features)
        • Training data, where n_samples is the number of samples and n_features is the number of features.
      • y : array-like of shape (n_samples, )
        • The target vector relative to X.
      • groups : object
        • Always ignored, exists for compatibility.

    • Returns
      • n_splits : int
        • Returns the number of splitting iterations in the cross-validator.
split(X, y, groups=None, seed=None)
  • Generate indices to split data into training and test set.

    • Parameters
      • X : array-like of shape (n_samples, n_features)
        • Training data, where n_samples is the number of samples and n_features is the number of features.
      • y : array-like of shape (n_samples,)
        • the target variable for supervised learning problems.
      • groups : array-like of shape (n_samples,), default=None
        • Group labels for the samples used while splitting the dataset into train/test set.
      • seed : int, default = None
        • If provided, is used to set a seed in subsampling

    • Yields
      • train : ndarray
        • The training set indices for that split.
      • test : ndarray
        • The testing set indices for that split.
See also:
RebalancedKFold
       Stratified K-fold iterator with training set rebalancing
RebalancedLeavePOut
       Leave-P-out iterator with training set rebalancing

For more background on LeaveOneOut, refer to the scikit-learn User Guide.