class debiasm.MultitaskDebiasMClassifier(batch_str
= ‘infer’,
learning_rate=0.005,
min_epochs=25,
l2_strength=0,
w_l2=0,
random_state=None,
x_val=None,
prediction_loss=torch.nn.functional.binary_cross_entropy
)
The Multitask DEBIAS-M Classifier.
This class
implements multiplicative bias-correction via DEBIAS-M for a multitask
classifier. It received as input an X
matrix of n_samples
n_taxa read count or relative abundancees from multiple microbiome
samples, along with binary y
labels for at least two tasks.
The ‘batch_str’ parameter weights the strength of the enforced
cross-batch similarity, ‘l2_strength’ for an l2 regularization of the
predictive parameters, and ‘w_l2’ for an l2 regularization of the
bias-correction parameters. ‘x_val’ corresponds to microbiome inputs for
a held-out set, for which the y
labels are unavailable.
n_samples
x 1 + n_taxa
matrix
describing the read counts of held-out validation and/or test sets, for
which any validation or testing labels will not be available during
training. The first column of x_val
denotes the batch of each sample, as non-negative integers which are
interpreted alongside batches specified in the train inputs. Providing x_val allows DEBIAS-M to account for
distribution shifts from these samples during training.## import packages
import numpy as np
from sklearn.metrics import roc_auc_score
from debiasm import MultitaskDebiasMClassifier
## generate data for the example
np.random.seed(123)
n_samples = 96*5
n_batches = 5
n_features = 100
n_tasks=3
## the read count matrix
X = ( np.random.rand(n_samples, n_features) * 1000 ).astype(int)
## the labels
y = np.random.rand(n_samples, n_tasks)>0.5
## the batches
batches = ( np.random.rand(n_samples) * n_batches ).astype(int)
## we assume the batches are numbered ints starting at '0',
## and they are in the first column of the input X matrices
X_with_batch = np.hstack((batches[:, np.newaxis], X))
## set the valdiation batch to '4'
val_inds = batches==4
X_train, X_val = X_with_batch[~val_inds], X_with_batch[val_inds]
y_train, y_val = y[~val_inds], y[val_inds]
### Run multitask DEBIAS-M, using standard sklearn object methods
multitask_model = MultitaskDebiasMClassifier(x_val=X_val)
multitask_model.fit(X_train, y_train)
## Assess resulting scores
predicted_scores = multitask_model.predict_proba(X_val)
## extract the 'DEBIAS-ed' data for other downstream analyses, if applicable
X_debiassed = multitask_model.transform(X_with_batch)
n_samples
is the number of
samples and n_taxa
is the number of taxa. The first column
of X denotes the batch of each sample, as non-negative integers, while
the remaining n_taxa
describe the read counts of each
taxon. DEBIAS-M also supports relative abundance inputs.n_tasks
represents
the number of training tasks.n_samples
is the number of
samples and n_taxa
is the number of taxa. The first column
of X denotes the batch of each sample, as non-negative integers, while
the remaining n_taxa
describe the read counts of each
taxon. DEBIAS-M also supports relative abundance inputs.n_samples
is the
number of samples and n_taxa
is the number of taxa. The
first column of X denotes the batch of each sample, as non-negative
integers, while the remaining n_taxa
describe the read
counts of each taxon. DEBIAS-M also supports relative abundance
inputs.See also:
Multitask DEBIAS-M Classifier
Demo
The DEBIAS-M regressor
OnlineDebiasMClassifier
DEBIAS-M for online corrections
For more background on
DEBIAS-M, refer to our
manuscript.