class rebalancedcv.RebalancedLeaveOneOut()
Rebalanced Leave-One-Out cross-validator.
Provides train/test indices to split data in train/test sets. Each sample is used once as a test set (singleton) while the remaining samples are used to form the training set, with subsampling to ensure identical class balances for all training sets across all splits.
This class is designed to have the same functionality and
implementation structure as scikit-learn’s
LeaveOneOut()
At least two observations per class are needed for
RebalancedLeaveOneOut
.
### Observing the indices on a small example dataset
import numpy as np
np.random.seed(1)
from rebalancedcv import RebalancedLeaveOneOut
X = np.array([[1, 2, 1, 2], [3, 4, 3, 4]]).T
y = np.array([1, 2, 1, 2])
rloo = RebalancedLeaveOneOut()
for i, (train_index, test_index) in enumerate(rloo.split(X, y)):
print(f"Fold {i}:")
print(f" Train: index={train_index}")
print(f" Test: index={test_index}")
Fold 0:
Train: index=[1 2]
Test: index=[0]
Fold 1:
Train: index=[0 3]
Test: index=[1]
Fold 2:
Train: index=[0 3]
Test: index=[2]
Fold 3:
Train: index=[1 2]
Test: index=[3]
### Implementing a LogisticRegressionCV evaluation on randomly generated data
### using RebalancedLeaveOneOut
import numpy as np
from sklearn.linear_model import LogisticRegressionCV
from rebalancedcv import RebalancedLeaveOneOut
from sklearn.metrics import roc_auc_score
## given some random `X` matrix, and a `y` binary vector
X = np.random.rand(100, 10)
y = np.random.rand(100) > 0.5
## Rebalanced leave-one-out evaluation
rloo = RebalancedLeaveOneOut()
rloocv_predictions = [ LogisticRegressionCV()\
.fit(X[train_index], y[train_index])\
.predict_proba(X[test_index]
)[:, 1][0]
for train_index, test_index in rloo.split(X,y)
]
## Since all the data is random, a fair evaluation
## should yield au auROC close to 0.5
print('Rebalanceed Leave-one-out auROC: {:.2f}'\
.format( roc_auc_score(y, rloocv_predictions) ) )
Rebalanceed Leave-one-out auROC: 0.49
The methods of the RebalancedLeaveOneOut
class are
designed to enable identical funcitonality to scikit-learn’s LeaveOneOut
.
n_samples
is the number of samples
and n_features
is the number of features.n_samples
is the number of samples
and n_features
is the number of features.