R: Parallel (Statistical) Functions

parallel-funs {kit}

R Documentation

Parallel (Statistical) Functions

Description

Vector-valued (statistical) functions operating in parallel over vectors passed as arguments, or a single list of vectors (such as a data frame). Similar to pmin and pmax, except that these functions do not recycle vectors.

Usage

  psum(..., na.rm = FALSE)
  pprod(..., na.rm = FALSE)
  pmean(..., na.rm = FALSE)
  pfirst(...)  # (na.rm = TRUE)
  plast(...)   # (na.rm = TRUE)
  pall(..., na.rm = FALSE)
  pallNA(...)
  pallv(..., value)
  pany(..., na.rm = FALSE)
  panyNA(...)
  panyv(..., value)
  pcount(..., value)
  pcountNA(...)

Arguments

`...`	suitable (atomic) vectors of the same length, or a single list of vectors (such as a `data.frame`). See Details on the allowed data types for each function, and Examples.
`na.rm`	A logical indicating whether missing values should be removed. Default value is `FALSE`, except for `pfirst` and `plast`.
`value`	A non `NULL` value of length 1.

Details

Functions psum, pprod work for integer, logical, double and complex types. pmean only supports integer, logical and double types. All 3 functions will error if used with factors.

pfirst/plast select the first/last non-missing value (or non-empty or NULL value for list-vectors). They accept all vector types with defined missing values + lists, but can only jointly handle integer and double types (not numeric and complex or character and factor). If factors are passed, they all need to have identical levels.

pany and pall are derived from base functions all and any and only allow logical inputs.

pcount counts the occurrence of value, and expects arguments of the same data type (except for value = NA). pcountNA is equivalent to pcount with value = NA, and they both allow NA counting in mixed-type data. pcountNA additionally supports list vectors and counts empty or NULL elements as NA.

Functions panyv/pallv are wrappers around pcount, and panyNA/pallNA are wrappers around pcountNA. They return a logical vector instead of the integer count.

None of these functions recycle vectors i.e. all input vectors need to have the same length. All functions support long vectors with up to 2^64-1 elements.

Value

psum/pprod/pmean return the sum, product or mean of all arguments. The value returned will be of the highest argument type (integer < double < complex). pprod only returns double or complex. pall[v/NA] and pany[v/NA] return a logical vector. pcount[NA] returns an integer vector. pfirst/plast return a vector of the same type as the inputs.

Author(s)

Morgan Jacob and Sebastian Krantz

Examples

x = c(1, 3, NA, 5)
y = c(2, NA, 4, 1)
z = c(3, 4, 4, 1)

# Example 1: psum 
psum(x, y, z, na.rm = FALSE)
psum(x, y, z, na.rm = TRUE)

# Example 2: pprod
pprod(x, y, z, na.rm = FALSE)
pprod(x, y, z, na.rm = TRUE)

# Example 3: pmean
pmean(x, y, z, na.rm = FALSE)
pmean(x, y, z, na.rm = TRUE)

# Example 4: pfirst and plast
pfirst(x, y, z)
plast(x, y, z)

# Adjust x, y, and z to use in pall and pany
x = c(TRUE, FALSE, NA, FALSE)
y = c(TRUE, NA, TRUE, TRUE)
z = c(TRUE, TRUE, FALSE, NA)

# Example 5: pall
pall(x, y, z, na.rm = FALSE)
pall(x, y, z, na.rm = TRUE)

# Example 6: pany
pany(x, y, z, na.rm = FALSE)
pany(x, y, z, na.rm = TRUE)

# Example 7: pcount
pcount(x, y, z, value = TRUE)
pcountNA(x, y, z)

# Example 8: list/data.frame as an input
pprod(iris[,1:2])
psum(iris[,1:2])
pmean(iris[,1:2])

# Benchmarks
# ----------
# n = 1e8L
# x = rnorm(n) # 763 Mb
# y = rnorm(n)
# z = rnorm(n)
# 
# microbenchmark::microbenchmark(
#   kit=psum(x, y, z, na.rm = TRUE),
#   base=rowSums(do.call(cbind,list(x, y, z)), na.rm=TRUE),
#   times = 5L, unit = "s"
# )
# Unit: Second
# expr  min   lq mean median   uq  max neval
# kit  0.52 0.52 0.65   0.55 0.83 0.84     5
# base 2.16 2.27 2.34   2.35 2.43 2.49     5
#
# x = sample(c(TRUE, FALSE, NA), n, TRUE) # 382 Mb
# y = sample(c(TRUE, FALSE, NA), n, TRUE)
# z = sample(c(TRUE, FALSE, NA), n, TRUE)
# 
# microbenchmark::microbenchmark(
#   kit=pany(x, y, z, na.rm = TRUE),
#   base=sapply(1:n, function(i) any(x[i],y[i],z[i],na.rm=TRUE)),
#   times = 5L
# )
# Unit: Second
# expr    min     lq   mean   median     uq    max neval
# kit    1.07   1.09   1.15     1.10   1.23   1.23     5
# base 111.31 112.02 112.78   112.97 113.55 114.03     5

[Package kit version 0.0.18 Index]