deleteBogusRows {kutils}R Documentation

Remove rows in which the proportion of missing data exceeds a threshold.

Description

If cases are mostly missing, delete them. It often happens that when data is imported from other sources, some noise rows exist at the bottom of the input. Anything that is missing in more than, say, 90% of cases is probably useless information. We invented this to deal with problem that MS Excel users often include a marginal note at the bottom of a spread sheet.

Usage

deleteBogusRows(dframe, pm = 0.9, drop = FALSE, verbose = TRUE, n = 25)

Arguments

dframe

A data frame or matrix

pm

"proportion missing data" to be tolerated.

drop

Default FALSE: if data frame result is reduced to one row, should R's default drop behavior "demote" this to a column vector.

verbose

Default TRUE. Should a report be printed summarizing information to be delted?

n

Default 25: limit on number of values to print in verbose diagnostic output. If set to NULL or NA, then all of the column values will be printed for the bogus rows.

Value

a data frame, invisibly

Author(s)

Paul Johnson <pauljohn@ku.edu>

Examples

mymat <- matrix(rnorm(10*100), nrow = 10, ncol = 100,
               dimnames = list(1:10, paste0("x", 1:100)))
mymat <- rbind(mymat, c(32, rep(NA, 99)))
mymat2 <- deleteBogusRows(mymat)
mydf <- as.data.frame(mymat)
mydf$someFactor <- factor(sample(c("A", "B"), size = NROW(mydf), replace = TRUE))
mydf2 <- deleteBogusRows(mydf, n = "all")

[Package kutils version 1.73 Index]