R: Efficient functions for dealing with missing values.

is_na {cheapr}

R Documentation

Efficient functions for dealing with missing values.

Description

is_na() is a parallelised alternative to is.na().
num_na(x) is a faster and more efficient sum(is.na(x)).
which_na(x) is a more efficient which(is.na(x))
which_not_na(x) is a more efficient which(!is.na(x))
row_na_counts(x) is a more efficient rowSums(is.na(x))
row_all_na() returns a logical vector indicating which rows are empty and have only NA values.
row_any_na() returns a logical vector indicating which rows have at least 1 NA value.
The col_ variants are the same, but operate by-column.

Usage

is_na(x)

## Default S3 method:
is_na(x)

## S3 method for class 'POSIXlt'
is_na(x)

## S3 method for class 'vctrs_rcrd'
is_na(x)

## S3 method for class 'data.frame'
is_na(x)

num_na(x, recursive = TRUE)

which_na(x)

which_not_na(x)

any_na(x, recursive = TRUE)

all_na(x, recursive = TRUE)

row_na_counts(x, names = FALSE)

col_na_counts(x, names = FALSE)

row_all_na(x, names = FALSE)

col_all_na(x, names = FALSE)

row_any_na(x, names = FALSE)

col_any_na(x, names = FALSE)

Arguments

`x`	A vector, list, data frame or matrix.
`recursive`	Should the function be applied recursively to lists? The default is `TRUE`. Setting this to `TRUE` is actually much cheaper because when `FALSE`, the other `NA` functions rely on calling `is_na()`, therefore allocating a vector. This is so that alternative objects with `is.na` methods can be supported.
`names`	Should row/col names be added?

Details

These functions are designed primarily for programmers, to increase the speed and memory-efficiency of NA handling.
Most of these functions can be parallelised through options(cheapr.cores).

Common use-cases

To replicate complete.cases(x), use !row_any_na(x).
To find rows with any empty values, use which_(row_any_na(df)).
To find empty rows use which_(row_all_na(df)) or which_na(df). To drop empty rows use na_rm(df) or sset(df, which_(row_all_na(df), TRUE)).

`is_na`

is_na Is an S3 generic function. It will internally fall back on using is.na if it can't find a suitable method. Alternatively you can write your own is_na method. For example there is a method for vctrs_rcrd objects that simply converts it to a data frame and then calls row_all_na(). There is also a POSIXlt method for is_na that is much faster than is.na.

Lists

When x is a list, num_na, any_na and all_na will recursively search the list for NA values. If recursive = F then is_na() is used to find NA values.
is_na differs to is.na in 2 ways:

List elements are counted as NA if either that value is NA, or if it's a list, then all values of that list are NA.
When called on a data frame, it returns TRUE for empty rows that contain only NA values.

Value

Number or location of NA values.

Examples

library(cheapr)
library(bench)

x <- 1:10
x[c(1, 5, 10)] <- NA
num_na(x)
which_na(x)
which_not_na(x)

row_nas <- row_na_counts(airquality, names = TRUE)
col_nas <- col_na_counts(airquality, names = TRUE)
row_nas
col_nas

df <- sset(airquality, j = 1:2)

# Number of NAs in data
num_na(df)
# Which rows are empty?
row_na <- row_all_na(df)
sset(df, row_na)

# Removing the empty rows
sset(df, which_(row_na, invert = TRUE))
# Or
na_rm(df)
# Or
sset(df, row_na_counts(df) < ncol(df))

[Package cheapr version 0.9.3 Index]