Numerify {NCmisc}R Documentation

Convert all possible columns of a data.frame to numeric

Description

Importing data from csv files can often lead to numeric variables being coded as factors or strings. This will not work well with many R functions. This function provides a quick way to deal with this across a whole data frame while attempting to leave columns untouched that are not genuinely numeric data. In edge cases you might need to adjust 'threshold' to get the correct result, usually an issue if mostly numeric columns often have strings amongst them, for instance a column with mostly numbers, but occassionally pipe-separated values like '4.4|5.0|6.1', etc.

Usage

Numerify(df, except = NULL, force = FALSE, digits = NA, thresh = 0.9)

Arguments

df

data.frame to transform to numeric (where possible)

except

avoid changing any colnames in this array

force

force all columns to numeric without checking types

digits

if a non-NA integer value is used, will round numeric columns to this many decimal places after making numeric.

thresh

threshold to decide that a variable is numeric. NA values will be ignored in the test. Then it looks at the proportion of values that are successfully coerced to numeric without giving 'NA'. If this threshold is 0.9, then any column where at least 90 converted to numeric type, will be kept as numeric, else they will be left as they were.

Value

data.frame with numeric type for any applicable columns

Author(s)

Nicholas Cooper

Examples

df <- data.frame(first=c(1:5),
 second=paste(6:10),
 third=c("jake", "fred", "cathy", "sandra", "mike"))
sapply(sapply(df, is), "[", 1) # check type of each column
dfN <- Numerify(df)
sapply(sapply(dfN, is), "[", 1) # now second column is numeric
df2 <- data.frame(first=c(1:10),
 second=paste(c(NA, NA, 6:10, "5|6", "7|8", 1)),
 third=rep(c("jake", "fred", "cathy", "sandra", "mike"),2))
sapply(sapply(df2, is), "[", 1)
df2N1 <- Numerify(df2, thresh=0.7)
df2N2 <- Numerify(df2, thresh=0.8)
sapply(sapply(df2N1, is), "[", 1) # at this threshold second column goes to numeric
sapply(sapply(df2N2, is), "[", 1) # second column stays a string at this threshold

[Package NCmisc version 1.2.0 Index]