impute_na {promor}R Documentation

Impute missing values

Description

This function imputes missing values using a user-specified imputation method.

Usage

impute_na(
  df,
  method = "minProb",
  tune_sigma = 1,
  q = 0.01,
  maxiter = 10,
  ntree = 20,
  n_pcs = 2,
  seed = NULL
)

Arguments

df

A raw_df object (output of create_df) containing missing values or a norm_df object after performing normalization.

method

Imputation method to use. Default is "minProb". Available methods: "minDet", "RF", "kNN", and "SVD".

tune_sigma

A scalar used in the "minProb" method for controlling the standard deviation of the Gaussian distribution from which random values are drawn for imputation.
Default is 1.

q

A scalar used in "minProb" and "minDet" methods to obtain a low intensity value for imputation. q should be set to a very low value. Default is 0.01.

maxiter

Maximum number of iterations to be performed when using the "RF" method. Default is 10.

ntree

Number of trees to grow in each forest when using the "RF" method. Default is 20.

n_pcs

Number of principal components to calculate when using the "SVD" method. Default is 2.

seed

Numerical. Random number seed. Default is NULL

Details

.

Value

An imp_df object, which is a data frame of protein intensities with no missing values.

Author(s)

Chathurani Ranathunge

References

Lazar, Cosmin, et al. "Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies." Journal of proteome research 15.4 (2016): 1116-1125.

See Also

More information on the available imputation methods can be found in their respective packages.

Examples

## Generate a raw_df object with default settings. No technical replicates.
raw_df <- create_df(
  prot_groups = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/pg1.txt",
  exp_design = "https://raw.githubusercontent.com/caranathunge/promor_example_data/main/ed1.txt"
)

## Impute missing values in the data frame using the default minProb
## method.
imp_df1 <- impute_na(raw_df, seed = 3312)


## Impute using the RF method with the number of iterations set at 5
## and number of trees set at 100.
imp_df2 <- impute_na(raw_df,
  method = "RF",
  maxiter = 5, ntree = 100,
  seed = 3312
)


## Using the kNN method.
imp_df3 <- impute_na(raw_df, method = "kNN", seed = 3312)



## Using the SVD method with n_pcs set to 3.
imp_df4 <- impute_na(raw_df, method = "SVD", n_pcs = 3, seed = 3312)

## Using the minDet method with q set at 0.001.
imp_df5 <- impute_na(raw_df, method = "minDet", q = 0.001, seed = 3312)

## Impute a normalized data set using the kNN method
imp_df6 <- impute_na(ecoli_norm_df, method = "kNN")


[Package promor version 0.2.1 Index]