assemble<-,Asset-method {Allspice} | R Documentation |
Finish Asset contents
Description
Trains a new classification model.
Usage
assemble(obj) <- value
Arguments
obj |
An object of the class Asset. |
value |
A list that containts training data, see details. |
Details
The value argument must contain three named elements: title
,
dat
and bits
. Optional predictors
and
covariates
elements can also be included.
The title is a descriptive identifier for the asset that will be
displayed by the Classifier object in report()
.
The dat
element is a matrix that contains the training samples.
The variables are organized into named rows and the samples into
named columns. Non-finite values are not allowed.
The predictors
element contains the names of the input variables
that should be used for training the model. If empty, all inputs are
used for automatic feature selection and subsequent training steps.
The covariates
element contains additional information for constructing
the final classification models. Unlike the data matrix, variables
are organized into named columns and the samples are stored as rows.
Non-finite values are not allowed.
The bits
element contains labels for category memberships. Three
formats are supported: 1) a character vector of named elements that
contains non-empty strings, 2) a matrix or a data frame with row names
and a single column of non-empty values, and 3) a matrix or a data frame
with multiple columns that contain binary values where 1s indicate
category membership (the name of the column is the name of the
category). Overlapping categories are allowed.
The final asset is assembled in six steps. First, the training data are standardized and normalized. Second, input variables are sorted according to their univariate classification performance. Third, redundant features are excluded by testing the sorted variables for mutual correlations; this produces an optimized listing of non-redundant features that are the most predictive of the category labels. Fourth, mean centroids are calculated for each category. Fifth, training samples are matched to their nearest centroids and the distances collected as preliminary predictor scores. Lastly, logistic regression models are fitted to the preliminary scores, covariates and category labels to enable the calculation of standardized predictor scores for new data.
Value
Updates the Asset object.
Examples
# Prepare training data.
simu <- bcellALL(200)
materials <- list(title="Simutypes")
materials$dat <- simu$counts
materials$covariates <- simu$metadata[,c("MALE","AGE")]
materials$bits <- simu$metadata[,"SUBTYPE",drop=FALSE]
# Assemble classification asset.
bALL <- asset()
assemble(bALL) <- materials
# Export asset into a new folder.
tpath <- tempfile()
export(bALL, folder = tpath)
# Create a classifier.
cls <- classifier(tpath, verbose = FALSE)
# Classify new samples.
simu <- bcellALL(5)
covariates(cls) <- simu$metadata
profiles(cls) <- simu$counts
primary <- predictions(cls)[[1]]
print(primary[,c("LABEL","PROX","EXCL")])