class rebalancedcv.RebalancedKFold(n_splits=5, shuffle=False, random_state=None)
Stratified K-Fold cross-validator with rebalancing.
Provides train/test indices to split data in train/test sets, with sub-sampling within the training set to ensure that all training folds have identical class balances.
This class is designed to have the same functionality and
implementation structure as scikit-learn’s
StratifiedKFold
.
shuffle
is True, random_state
affects
the ordering of the indices, which controls the randomness of each fold
for each class. Otherwise, leave random_state
as
None
. Pass an int for reproducible output across multiple
function calls. See
:term:Glossary <random_state>
.These parameters are designed to match the structure and
functionality of scikit-learn’s StratifiedKFold
.
### Observing the indices on a small example dataset
import numpy as np
from rebalancedcv import RebalancedKFold
X = np.array([[1, 2, 1, 2, 1], [3, 4, 3, 4, 3]]).T
y = np.array([1, 2, 1, 2, 1])
rloo = RebalancedKFold(n_splits=2)
rloo.get_n_splits(X, y)
for i, (train_index, test_index) in enumerate(rloo.split(X, y)):
print(f"Fold {i}:")
print(f" Train: index={train_index}")
print(f" Test: index={test_index}")
Fold 0:
Train: index=[3 4]
Test: index=[0 1 2]
Fold 1:
Train: index=[0 1]
Test: index=[3 4]
The methods of the RebalancedLeavePOut
class are
designed to enable identical funcitonality to scikit-learn’s LeavePOut
.
n_samples
is the number of samples
and n_features
is the number of features.