PredictFastClass {MSclassifR} | R Documentation |
Prediction of the category to which a mass spectrum belongs using linear regressions of mass spectra.
Description
For each mass peak in a list of mass peaks, a linear regression is performed between the mass spectrum and mass spectra corresponding to a category. This is performed for each category and associated to an Akaike Information Criterium. Next, the AIC are used to determine the belonging of a mass spectrum to a category. It also provides a probability that the mass spectrum does not belong to any of the input categories.
Usage
PredictFastClass(peaks,
mod_peaks,
Y_mod_peaks,
moz="ALL",
tolerance = 6,
toleranceStep = 2,
normalizeFun = TRUE,
noMatch = 0)
Arguments
peaks |
a list of |
mod_peaks |
an intensity matrix corresponding to mass spectra for which the category is known. Each column is a mass-over-charge value, each row corresponds to a mass spectrum. |
Y_mod_peaks |
a |
moz |
a |
tolerance |
a |
toleranceStep |
a |
normalizeFun |
a |
noMatch |
a |
Value
Returns a dataframe
containing AIC criteria by category for each mass spectrum in peaks
. The AIC criterion should be minimal for the most probable category. The pred_cat
column is the predicted category for each mass spectrum in peaks
. The p_not_in_DB
is the minimal p-value of several Fisher tests testing if all the linear coefficients associated to mass spectra of a category are null. It can be interpreted as a p-value that the mass spectrum is not present in the input database.
Examples
library("MSclassifR")
library("MALDIquant")
# load mass spectra and their metadata
data("CitrobacterRKIspectra","CitrobacterRKImetadata", package = "MSclassifR")
# standard pre-processing of mass spectra
spectra <- SignalProcessing(CitrobacterRKIspectra)
# detection of peaks in pre-processed mass spectra
peaks <- peaks <- MSclassifR::PeakDetection(x = spectra, averageMassSpec=FALSE)
# matrix with intensities of peaks arranged in rows (each column is a mass-over-charge value)
IntMat <- MALDIquant::intensityMatrix(peaks)
rownames(IntMat) <- paste(CitrobacterRKImetadata$Strain_name_spot)
# remove missing values in the matrix
IntMat[is.na(IntMat)] <- 0
# normalize peaks according to the maximum intensity value for each mass spectrum
IntMat <- apply(IntMat,1,function(x) x/(max(x)))
# transpose the matrix for statistical analysis
X <- t(IntMat)
# define the known categories of mass spectra for the classification
Y <- factor(CitrobacterRKImetadata$Species)
#Predict species without peak selection using a tolerance of 1 Da
res = PredictFastClass(peaks=peaks[1:5],
mod_peaks=X,
Y_mod_peaks=Y,
tolerance = 1)
#comparing predicted categories (species) and the truth
cbind(res$pred_cat,as.character(Y[1:5]))
# The method can be applied after a peak selection step
a <- SelectionVar(X,
Y,
MethodSelection = c("RFERF"),
MethodValidation = c("cv"),
PreProcessing = c("center","scale","nzv","corr"),
NumberCV = 2,
Metric = "Kappa",
Sizes = c(20:40),
Sampling = "up")
#Predict species from selected peaks using a tolerance of 1 Da
res = PredictFastClass(peaks=peaks[1:5],
moz = a$sel_moz,
mod_peaks=X,
Y_mod_peaks=Y, tolerance = 1)
#comparing predicted categories (species) and the truth
cbind(res$pred_cat,as.character(Y[1:5]))