R: Data_impute

Data_impute {DDPNA}

R Documentation

Data_impute

Description

data clean process: detect and remove outlier sample and impute missing value. The process is following: 1. Remove some genes which the number of missing value larger than maxNAratio. 2. Outlier sample detect and remove these sample. 3. Repeat Steps 1-2 untile meet the iteration times or no outlier sample can be detected. 4. impute the missing value. The function also can only do gene filter or remove outlier or impute missing value.

Usage

Data_impute(data, inf = "inf", intensity = "LFQ", miss.value = NA,
            splNExt = TRUE, maxNAratio = 0.5,
            removeOutlier = TRUE,
            outlierdata = "intensity", iteration = NA, sdout = 2,
            distmethod = "manhattan", A.IAC = FALSE,
            dohclust = FALSE, treelabels = NA,
            plot = TRUE, filename = NULL,
            text.cex = 0.7, text.col = "red", text.pos = 1,
            text.labels = NA, abline.col = "red", abline.lwd = 2,
            impute = TRUE, verbose = 1, ...)

Arguments

`data`	MaxQconvert data or a list Vector which contain two data.frame:ID information and quantification data
`inf`	the data.frame name contain protein ID information
`intensity`	the data.frame name only contain quantification data
`miss.value`	the type of miss.value showed in quantificaiton data. The default value is `NA`. The miss.value usually can be `NA` or `0`.
`splNExt`	a logical value whether extract sample name.(suited for MaxQuant quantification data)
`maxNAratio`	The maximum percent missing data allowed in any row (default 50%).For any rows with more than maxNAratio% missing will deleted.
`removeOutlier`	a logical value indicated whether remove outlier sample.
`outlierdata`	The value is deprecated. which data will be used to analysis outlier sample detect.This must be (an abbreviation of) one of the strings "`intensity`","`relative_value`","`log2_value`".
`iteration`	a numberic value indicating how many times it go through the outlier sample detect and remove loop.`NA` means do loops until no outlier sample.
`sdout`	a numberic value indicating the threshold to judge the outlier sample. The default `2` means 0.95 confidence intervals
`distmethod`	The distance measure to be used. This must be (an abbreviation of) one of the strings "`manhattan`","`euclidean`", "`canberra`","`correlation`","`bicor`"
`A.IAC`	a logical value indicated whether decreasing `correlation` variance.
`dohclust`	a logical value indicated whether doing hierarchical clustering and plot dendrograms.
`treelabels`	labels of dendrograms
`plot`	a logical value indicated whether plot numbersd scatter diagrams.
`filename`	the filename of plot. The number and plot type information will added automatically. The default value is `NULL` which means no file saving. all the plot will be saved to "plot" folder and saved in pdf format.
`text.cex`	outlier sample annotation text size(scatter diagrams parameters)
`text.col`	outlier sample annotation color(scatter diagrams parameters)
`text.pos`	outlier sample annotation position(scatter diagrams parameters)
`text.labels`	outlier sample annotation (scatter diagrams parameters)
`abline.col`	the threshold line color (scatter diagrams parameters)
`abline.lwd`	the threshold line width (scatter diagrams parameters)
`impute`	a logical value indicated whether do knn imputation.
`verbose`	integer level of verbosity. Zero means silent, 1 means have some Diagnostic Messages.
`...`	Other arguments.

Details

detect and remove outlier sample and impute missing value.

Value

a list of proteomic data.

`inf`	Portein information included protein IDs and other information.
`intensity`	Quantification informaton.
`relative_value`	intensity divided by geometric mean
`log2_value`	log2 of relative_value

Author(s)

Kefu Liu

Examples

data(Dforimpute)
data <- Data_impute(Dforimpute,distmethod="manhattan")

[Package DDPNA version 0.3.3 Index]