BuildFeatureMatrix {speaq} | R Documentation |
Build a Feature matrix from the with speaq 2.0 processed data
Description
This function converts the grouped peak data to a matrix. The matrix has features (peaks groups) in the columns and the value of the peak for every sample in the rows.
Usage
BuildFeatureMatrix(
Y.data,
var = "peakValue",
impute = "zero",
imputation_val = NA,
delete.below.threshold = FALSE,
baselineThresh = 500,
snrThres = 3,
thresholds.pass = "any-to-pass"
)
Arguments
Y.data |
The dataset after (at least) peak detection and grouping with speaq 2.0. The dataset after peak filling is recommended. |
var |
The variable to be used in the Featurematrix. This can be any of 'peakIndex', 'peakPPM', 'peakValue' (default), 'peakSNR', 'peakScale', or 'Sample'. |
impute |
What to impute when a certain peak is missing for a certain sample and feature combo. Options are "zero" (or "zeros", the default), "median" (imputation with feature median), "randomForest" (imputation with missForest function from package missForest) or kNN followed by a number indicating the amount of neighbours to use e.g. "kNN5" or "kNN10" (as per the method of Troyanskaya, 2001) or lasty "User_value" (this will allow the use of any value specified with the imputation_val argument e.g. the median of the raw spectra). Any other statement will produce NA's. |
imputation_val |
If the "User_value" imputation option is chosen this value will be used to impute the missing values. |
delete.below.threshold |
Whether to ignore peaks for which the 'var' variable has a value below 'baselineThresh' (default = FALSE). |
baselineThresh |
The threshold for the 'var' variable that peaks have to surpass to be included in the feature matrix. |
snrThres |
The threshold for the signal-to-noise ratio of a peak. |
thresholds.pass |
This variable lets users decide whether a peak has to pass all the thresholds (both snrThres and baselineThresh), or just one. (If the peak does not need to surpass any thresholds set 'delete.below.threshold' to FALSE). |
Value
a matrix, data.matrix, with samples for rows and features for columns. The values in the matrix are those of the 'var' variable.
Author(s)
Charlie Beirnaert, charlie.beirnaert@uantwerpen.be
References
Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein and Russ B. Altman, Missing value estimation methods for DNA microarrays BIOINFORMATICS Vol. 17 no. 6, 2001 Pages 520-525
Examples
subset <- GetWinedata.subset()
# to reduce the example time we only select spectra 1 & 2
subset.spectra = as.matrix(subset$Spectra)[1:2,]
subset.ppm = as.numeric(subset$PPM)
test.peaks <- getWaveletPeaks(Y.spec=subset.spectra,
X.ppm=subset.ppm,
nCPU = 1) # nCPU set to 2 for the vignette build
test.grouped <- PeakGrouper(Y.peaks = test.peaks)
test.Features <- BuildFeatureMatrix(test.grouped)