is_na {cheapr} | R Documentation |
Efficient functions for dealing with missing values.
Description
is_na()
is a parallelised alternative to is.na()
.
num_na(x)
is a faster and more efficient sum(is.na(x))
.
which_na(x)
is a more efficient which(is.na(x))
which_not_na(x)
is a more efficient which(!is.na(x))
row_na_counts(x)
is a more efficient rowSums(is.na(x))
row_all_na()
returns a logical vector indicating which rows are empty
and have only NA
values.
row_any_na()
returns a logical vector indicating which rows have at least
1 NA
value.
The col_
variants are the same, but operate by-column.
Usage
is_na(x)
## Default S3 method:
is_na(x)
## S3 method for class 'POSIXlt'
is_na(x)
## S3 method for class 'vctrs_rcrd'
is_na(x)
## S3 method for class 'data.frame'
is_na(x)
num_na(x, recursive = TRUE)
which_na(x)
which_not_na(x)
any_na(x, recursive = TRUE)
all_na(x, recursive = TRUE)
row_na_counts(x, names = FALSE)
col_na_counts(x, names = FALSE)
row_all_na(x, names = FALSE)
col_all_na(x, names = FALSE)
row_any_na(x, names = FALSE)
col_any_na(x, names = FALSE)
Arguments
x |
A vector, list, data frame or matrix. |
recursive |
Should the function be applied recursively to lists?
The default is |
names |
Should row/col names be added? |
Details
These functions are designed primarily for programmers, to increase the speed
and memory-efficiency of NA
handling.
Most of these functions can be parallelised through options(cheapr.cores)
.
Common use-cases
To replicate complete.cases(x)
, use !row_any_na(x)
.
To find rows with any empty values,
use which_(row_any_na(df))
.
To find empty rows use which_(row_all_na(df))
or which_na(df)
.
To drop empty rows use na_rm(df)
or sset(df, which_(row_all_na(df), TRUE))
.
is_na
is_na
Is an S3 generic function. It will internally fall back on
using is.na
if it can't find a suitable method.
Alternatively you can write your own is_na
method.
For example there is a method for vctrs_rcrd
objects that simply converts it to a data frame and then calls row_all_na()
.
There is also a POSIXlt
method for is_na
that is much faster than is.na
.
Lists
When x
is a list, num_na
, any_na
and all_na
will recursively search
the list for NA
values. If recursive = F
then is_na()
is used to
find NA
values.
is_na
differs to is.na
in 2 ways:
List elements are counted as
NA
if either that value isNA
, or if it's a list, then all values of that list areNA
.When called on a data frame, it returns
TRUE
for empty rows that contain onlyNA
values.
Value
Number or location of NA
values.
Examples
library(cheapr)
library(bench)
x <- 1:10
x[c(1, 5, 10)] <- NA
num_na(x)
which_na(x)
which_not_na(x)
row_nas <- row_na_counts(airquality, names = TRUE)
col_nas <- col_na_counts(airquality, names = TRUE)
row_nas
col_nas
df <- sset(airquality, j = 1:2)
# Number of NAs in data
num_na(df)
# Which rows are empty?
row_na <- row_all_na(df)
sset(df, row_na)
# Removing the empty rows
sset(df, which_(row_na, invert = TRUE))
# Or
na_rm(df)
# Or
sset(df, row_na_counts(df) < ncol(df))