class debiasm.DebiasMClassifierLogAdd(batch_str
= ‘infer’,
learning_rate=0.005,
min_epochs=25,
l2_strength=0,
w_l2=0,
random_state=None,
x_val=None,
prediction_loss=torch.nn.functional.binary_cross_entropy
)
The DEBIAS-M Classifier implementation for logspace inputs.
This class implements additive DEBIAS-M bias-correction, which
models the processing-bias mechanism in logarithmic space
representations of read counts, such as the center log ratio
transform.
This uses a microbiome n_samples n_taxa of some logarithmic processed
read count matrices from multiple X
samples, along with a
provided binary y
label.
The ‘batch_str’ parameter weights the strength of the enforced
cross-batch similarity, ‘l2_strength’ for an l2 regularization of the
predictive parameters, and ‘w_l2’ for an l2 regularization of the
bias-correction parameters. ‘x_val’ corresponds to microbiome inputs for
a held-out set, for which the y
labels are unavailable.
n_samples
x 1 + n_taxa
matrix
describing the log-processed read counts of held-out validation and/or
test sets, for which any validation or testing labels will not be
available during training. The first column of x_val denotes the batch of each sample, as
non-negative integers which are interpreted alongside batches specified
in the train inputs. Providing x_val allows DEBIAS-M to account for
distribution shifts from these samples during training.## import packages
import numpy as np
from sklearn.metrics import roc_auc_score
from skbio.stats.composition import clr
from debiasm import DebiasMClassifierLogAdd
## generate data for the example
np.random.seed(123)
n_samples = 96*5
n_batches = 5
n_features = 100
## the read count matrix, with a pseudocount
X = 1 + ( np.random.rand(n_samples, n_features) * 1000 ).astype(int)
## map into relative abundance, then center log ratio space
X = clr( X / X.sum(axis=1)[:, np.newaxis] )
## the labels
y = np.random.rand(n_samples)>0.5
## the batches
batches = ( np.random.rand(n_samples) * n_batches ).astype(int)
## we assume the batches are numbered ints starting at '0',
## and they are in the first column of the input X matrices
X_with_batch = np.hstack((batches[:, np.newaxis], X))
## set the valdiation batch to '4'
val_inds = batches==4
X_train, X_val = X_with_batch[~val_inds], X_with_batch[val_inds]
y_train, y_val = y[~val_inds], y[val_inds]
### Run DEBIAS-M, using standard sklearn object methods
dmc = DebiasMClassifierLogAdd(x_val=X_val) ## give it the held-out inputs to account for
## those domains shifts while training
dmc.fit(X_train, y_train)
## Assess results
### should be ~~0.5 in this example , since the data is all random
roc_auc_score(y_val, dmc.predict_proba(X_val)[:, 1])
## extract the 'DEBIAS-ed' data for other downstream analyses, if applicable
X_debiassed = dmc.transform(X_with_batch)
n_samples
is the number of
samples and n_taxa
is the number of taxa. The first column
of X denotes the batch of each sample, as non-negative integers, while
the remaining n_taxa
describe log-processed output of each
taxon. DEBIAS-M also supports relative abundance inputs.n_samples
is the
number of samples and n_taxa
is the number of taxa. The
first column of X denotes the batch of each sample, as non-negative
integers, while the remaining n_taxa
describe the
log-processed output of each taxon. DEBIAS-M also supports relative
abundance inputs.n_samples
is the number of samples and n_taxa
is the number of taxa.
The first column of X denotes the batch of each sample, as non-negative
integers, while the remaining n_taxa
describe the
log-processed output of each taxon. DEBIAS-M also supports relative
abundance inputs.See also:
DEBIAS-M
Regression
Te DEBIAS-M regressor
DebiasMClassifier
Implementation of a DEBIAS-M regressor
For more
background on DEBIAS-M, refer to our
manuscript.