class rebalancedcv.MulticlassRebalancedLeaveOneOut()
Multiclass Rebalanced Leave-One-Out cross-validator. Designed to
handle a number of y
classes greater than 2.
Provides train/test indices to split data in train/test sets. Each sample is used once as a test set (singleton) while the remaining samples are used to form the training set, with subsampling to ensure identical class balances for all training sets across all splits.
This class is designed to have the same functionality and
implementation structure as scikit-learn’s
LeaveOneOut()
At least two observations per class are needed for
MulticlassRebalancedLeaveOneOut
.
### Observing the indices on a small example dataset
import numpy as np
np.random.seed(1)
from rebalancedcv import MulticlassRebalancedLeaveOneOut
X = np.array([[1, 2, 3, 1, 2, 3], [3, 4, 5, 3, 4, 5]]).T
y = np.array([1, 2, 3, 1, 2, 3])
mrloo = MulticlassRebalancedLeaveOneOut()
for i, (train_index, test_index) in enumerate(mrloo.split(X, y)):
print(f"Fold {i}:")
print(f" Train: index={train_index}")
print(f" Test: index={test_index}")
Fold 0:
Train: index=[1 2 3]
Test: index=[0]
Fold 1:
Train: index=[0 2 4]
Test: index=[1]
Fold 2:
Train: index=[0 1 4]
Test: index=[2]
Fold 3:
Train: index=[0 1 2]
Test: index=[3]
...
### Implementing a LogisticRegressionCV evaluation on randomly generated data
### using RebalancedLeaveOneOut
import numpy as np
from sklearn.linear_model import LogisticRegressionCV
from rebalancedcv import MulticlassRebalancedLeaveOneOut
from sklearn.metrics import roc_auc_score
## given some random `X` matrix, and a `y` binary vector
X = np.random.rand(100, 10)
y = ( np.random.rand(100) * 4 ).astype(int)
## Rebalanced leave-one-out evaluation
mrloo = MulticlassRebalancedLeaveOneOut()
rloocv_predictions = [ LogisticRegressionCV()\
.fit(X[train_index], y[train_index])\
.predict_proba(X[test_index]
)
for train_index, test_index in mrloo.split(X,y)
]
The methods of the MulticlassRebalancedLeaveOneOut
class
are designed to enable identical funcitonality to scikit-learn’s LeaveOneOut
.
n_samples
is the number of samples
and n_features
is the number of features.n_samples
is the number of samples
and n_features
is the number of features.