Data_impute {DDPNA} | R Documentation |
Data_impute
Description
data clean process: detect and remove outlier sample and impute missing value. The process is following: 1. Remove some genes which the number of missing value larger than maxNAratio. 2. Outlier sample detect and remove these sample. 3. Repeat Steps 1-2 untile meet the iteration times or no outlier sample can be detected. 4. impute the missing value. The function also can only do gene filter or remove outlier or impute missing value.
Usage
Data_impute(data, inf = "inf", intensity = "LFQ", miss.value = NA,
splNExt = TRUE, maxNAratio = 0.5,
removeOutlier = TRUE,
outlierdata = "intensity", iteration = NA, sdout = 2,
distmethod = "manhattan", A.IAC = FALSE,
dohclust = FALSE, treelabels = NA,
plot = TRUE, filename = NULL,
text.cex = 0.7, text.col = "red", text.pos = 1,
text.labels = NA, abline.col = "red", abline.lwd = 2,
impute = TRUE, verbose = 1, ...)
Arguments
data |
MaxQconvert data or a list Vector which contain two data.frame:ID information and quantification data |
inf |
the data.frame name contain protein ID information |
intensity |
the data.frame name only contain quantification data |
miss.value |
the type of miss.value showed in quantificaiton data.
The default value is |
splNExt |
a logical value whether extract sample name.(suited for MaxQuant quantification data) |
maxNAratio |
The maximum percent missing data allowed in any row (default 50%).For any rows with more than maxNAratio% missing will deleted. |
removeOutlier |
a logical value indicated whether remove outlier sample. |
outlierdata |
The value is deprecated.
which data will be used to analysis outlier sample detect.This must be (an abbreviation of) one of the strings " |
iteration |
a numberic value indicating how many times it go through the outlier sample detect and remove loop. |
sdout |
a numberic value indicating the threshold to judge the outlier sample. The default |
distmethod |
The distance measure to be used. This must be (an abbreviation of) one of the strings " |
A.IAC |
a logical value indicated whether decreasing |
dohclust |
a logical value indicated whether doing hierarchical clustering and plot dendrograms. |
treelabels |
labels of dendrograms |
plot |
a logical value indicated whether plot numbersd scatter diagrams. |
filename |
the filename of plot. The number and plot type information will added automatically. The default value is |
text.cex |
outlier sample annotation text size(scatter diagrams parameters) |
text.col |
outlier sample annotation color(scatter diagrams parameters) |
text.pos |
outlier sample annotation position(scatter diagrams parameters) |
text.labels |
outlier sample annotation (scatter diagrams parameters) |
abline.col |
the threshold line color (scatter diagrams parameters) |
abline.lwd |
the threshold line width (scatter diagrams parameters) |
impute |
a logical value indicated whether do knn imputation. |
verbose |
integer level of verbosity. Zero means silent, 1 means have some Diagnostic Messages. |
... |
Other arguments. |
Details
detect and remove outlier sample and impute missing value.
Value
a list of proteomic data.
inf |
Portein information included protein IDs and other information. |
intensity |
Quantification informaton. |
relative_value |
intensity divided by geometric mean |
log2_value |
log2 of relative_value |
Author(s)
Kefu Liu
Examples
data(Dforimpute)
data <- Data_impute(Dforimpute,distmethod="manhattan")