impute_na {promor} | R Documentation |
Impute missing values
Description
This function imputes missing values using a user-specified imputation method.
Usage
impute_na(
df,
method = "minProb",
tune_sigma = 1,
q = 0.01,
maxiter = 10,
ntree = 20,
n_pcs = 2,
seed = NULL
)
Arguments
df |
A |
method |
Imputation method to use. Default is |
tune_sigma |
A scalar used in the |
q |
A scalar used in |
maxiter |
Maximum number of iterations to be performed when using the
|
ntree |
Number of trees to grow in each forest when using the
|
n_pcs |
Number of principal components to calculate when using the
|
seed |
Numerical. Random number seed. Default is |
Details
Ideally, you should first remove proteins with high levels of missing data using the
filterbygroup_na
function before runningimpute_na
on theraw_df
object or thenorm_df
object.-
impute_na
function imputes missing values using a user-specified imputation method from the available options,minProb
,minDet
,kNN
,RF
, andSVD
. -
Note: Some imputation methods may require that the data be normalized prior to imputation.
Make sure to fix the random number seed with
seed
for reproducibility
.
Value
An imp_df
object, which is a data frame of protein intensities
with no missing values.
Author(s)
Chathurani Ranathunge
References
Lazar, Cosmin, et al. "Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies." Journal of proteome research 15.4 (2016): 1116-1125.
See Also
More information on the available imputation methods can be found in their respective packages.
For
minProb
andminDet
methods, seeimputeLCMD
package.For Random Forest (
RF
) method, seemissForest
.For
SVD
method, seepca
from thepcaMethods
package.
Examples
## Generate a raw_df object with default settings. No technical replicates.
raw_df <- create_df(
prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt"
)
## Impute missing values in the data frame using the default minProb
## method.
imp_df1 <- impute_na(raw_df, seed = 3312)
## Impute using the RF method with the number of iterations set at 5
## and number of trees set at 100.
imp_df2 <- impute_na(raw_df,
method = "RF",
maxiter = 5, ntree = 100,
seed = 3312
)
## Using the kNN method.
imp_df3 <- impute_na(raw_df, method = "kNN", seed = 3312)
## Using the SVD method with n_pcs set to 3.
imp_df4 <- impute_na(raw_df, method = "SVD", n_pcs = 3, seed = 3312)
## Using the minDet method with q set at 0.001.
imp_df5 <- impute_na(raw_df, method = "minDet", q = 0.001, seed = 3312)
## Impute a normalized data set using the kNN method
imp_df6 <- impute_na(ecoli_norm_df, method = "kNN")