Numerify {NCmisc} | R Documentation |
Convert all possible columns of a data.frame to numeric
Description
Importing data from csv files can often lead to numeric variables being coded as factors or strings. This will not work well with many R functions. This function provides a quick way to deal with this across a whole data frame while attempting to leave columns untouched that are not genuinely numeric data. In edge cases you might need to adjust 'threshold' to get the correct result, usually an issue if mostly numeric columns often have strings amongst them, for instance a column with mostly numbers, but occassionally pipe-separated values like '4.4|5.0|6.1', etc.
Usage
Numerify(df, except = NULL, force = FALSE, digits = NA, thresh = 0.9)
Arguments
df |
data.frame to transform to numeric (where possible) |
except |
avoid changing any colnames in this array |
force |
force all columns to numeric without checking types |
digits |
if a non-NA integer value is used, will round numeric columns to this many decimal places after making numeric. |
thresh |
threshold to decide that a variable is numeric. NA values will be ignored in the test. Then it looks at the proportion of values that are successfully coerced to numeric without giving 'NA'. If this threshold is 0.9, then any column where at least 90 converted to numeric type, will be kept as numeric, else they will be left as they were. |
Value
data.frame with numeric type for any applicable columns
Author(s)
Nicholas Cooper
Examples
df <- data.frame(first=c(1:5),
second=paste(6:10),
third=c("jake", "fred", "cathy", "sandra", "mike"))
sapply(sapply(df, is), "[", 1) # check type of each column
dfN <- Numerify(df)
sapply(sapply(dfN, is), "[", 1) # now second column is numeric
df2 <- data.frame(first=c(1:10),
second=paste(c(NA, NA, 6:10, "5|6", "7|8", 1)),
third=rep(c("jake", "fred", "cathy", "sandra", "mike"),2))
sapply(sapply(df2, is), "[", 1)
df2N1 <- Numerify(df2, thresh=0.7)
df2N2 <- Numerify(df2, thresh=0.8)
sapply(sapply(df2N1, is), "[", 1) # at this threshold second column goes to numeric
sapply(sapply(df2N2, is), "[", 1) # second column stays a string at this threshold