big_colstats {bigstatsr} | R Documentation |
Standard univariate statistics
Description
Standard univariate statistics for columns of a Filebacked Big Matrix.
For now, the sum
and var
are implemented
(the mean
and sd
can easily be deduced, see examples).
Usage
big_colstats(X, ind.row = rows_along(X), ind.col = cols_along(X), ncores = 1)
Arguments
X |
An object of class FBM. |
ind.row |
An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices. |
ind.col |
An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices. |
ncores |
Number of cores used. Default doesn't use parallelism. You may use nb_cores. |
Value
Data.frame of two numeric vectors sum
and var
with the
corresponding column statistics.
See Also
Examples
set.seed(1)
X <- big_attachExtdata()
# Check the results
str(test <- big_colstats(X))
# Only with the first 100 rows
ind <- 1:100
str(test2 <- big_colstats(X, ind.row = ind))
plot(test$sum, test2$sum)
abline(lm(test2$sum ~ test$sum), col = "red", lwd = 2)
X.ind <- X[ind, ]
all.equal(test2$sum, colSums(X.ind))
all.equal(test2$var, apply(X.ind, 2, var))
# deduce mean and sd
# note that the are also implemented in big_scale()
means <- test2$sum / length(ind) # if using all rows,
# divide by nrow(X) instead
all.equal(means, colMeans(X.ind))
sds <- sqrt(test2$var)
all.equal(sds, apply(X.ind, 2, sd))
[Package bigstatsr version 1.5.12 Index]