replace.missing.df {NCmisc} | R Documentation |
Iterate through numeric columns of a dataframe and replace missing with the mean
Description
To simple replace missing data without changing column means. This will also use criteria to decide whether each column is numeric, so that illegal operations aren't performed on strings, etc. Also adjusting the 'error' parameter allows adding variance to the missing observations to help to reduce bias associated with inserting many of the same replacement value.
Usage
replace.missing.df(
X,
repl.fun = mean,
error = 0,
thresh = 0.9,
digits = 99,
force = FALSE
)
Arguments
X |
a data.frame to replace missing values in |
repl.fun |
the function to perform the replacement. Default is 'mean'. A replacement should take a vector 'x' and produce a single scalar as a result. |
error |
default value is 0, meaning replacements will be all the same value for each column of the data.frame X. If you give a positive value, this amount of gaussian noise (in StDev units of the original variable) will be added to the replacement values. |
thresh |
passed to function 'is.vec.numeric', see explanation there. |
digits |
Trim replacement values to this many digits |
force |
TRUE means replace missing for all columns with testing for numeric |
Value
returns a data.frame with the same dimensions with missing values for numeric values imputed using the repl.fun function, optionally with noise added.
Author(s)
Nicholas Cooper
Examples
df <- data.frame(first=c(1,2,NA,4,5),
second=paste(c(6,7,8,NA,10)),
third=c("jake", "fred", "cathy", "sandra", "mike"))
df
replace.missing.df(df)
replace.missing.df(df, force=TRUE)
df2 <- data.frame(first=c(1:5, NA, NA, NA,9, 10),
second=paste(c(NA, NA, 6:10, "5|6", "7|8", 1)),
third=rep(c("jake", "fred", "cathy", "sandra", "mike"),2))
df2
replace.missing.df(df2)
replace.missing.df(df2, thresh=0.7)
replace.missing.df(df2, error = 1, thresh=0.7, digits=4)