R: bigstatsr: Statistical Tools for Filebacked Big Matrices

bigstatsr-package {bigstatsr}

R Documentation

bigstatsr: Statistical Tools for Filebacked Big Matrices

Description

Easy-to-use, efficient, flexible and scalable statistical tools. Package bigstatsr provides and uses Filebacked Big Matrices via memory-mapping. It provides for instance matrix operations, Principal Component Analysis, sparse linear supervised models, utility functions and more <doi:10.1093/bioinformatics/bty185>.

Arguments

`X`	An object of class FBM.
`X.code`	An object of class FBM.code256.
`y.train`	Vector of responses, corresponding to `ind.train`.
`y01.train`	Vector of responses, corresponding to `ind.train`. Must be only 0s and 1s.
`ind.train`	An optional vector of the row indices that are used, for the training part. If not specified, all rows are used. Don't use negative indices.
`ind.row`	An optional vector of the row indices that are used. If not specified, all rows are used. Don't use negative indices.
`ind.col`	An optional vector of the column indices that are used. If not specified, all columns are used. Don't use negative indices.
`block.size`	Maximum number of columns read at once. Default uses block_size.
`ncores`	Number of cores used. Default doesn't use parallelism. You may use nb_cores.
`fun.scaling`	A function with parameters `X`, `ind.row` and `ind.col`, and that returns a data.frame with `⁠$center⁠` and `⁠$scale⁠` for the columns corresponding to `ind.col`, to scale each of their elements such as followed: `\frac{X_{i,j} - center_j}{scale_j}.` Default doesn't use any scaling. You can also provide your own `center` and `scale` by using `as_scaling_fun()`.
`covar.train`	Matrix of covariables to be added in each model to correct for confounders (e.g. the scores of PCA), corresponding to `ind.train`. Default is `NULL` and corresponds to only adding an intercept to each model. You can use `covar_from_df()` to convert from a data frame.
`covar.row`	Matrix of covariables to be added in each model to correct for confounders (e.g. the scores of PCA), corresponding to `ind.row`. Default is `NULL` and corresponds to only adding an intercept to each model. You can use `covar_from_df()` to convert from a data frame.
`center`	Vector of same length of `ind.col` to subtract from columns of `X`.
`scale`	Vector of same length of `ind.col` to divide from columns of `X`.

Matrix parallelization

Large matrix computations are made block-wise and won't be parallelized in order to not have to reduce the size of these blocks. Instead, you may use Microsoft R Open or OpenBLAS in order to accelerate these block matrix computations. You can also control the number of cores used with bigparallelr::set_blas_ncores().

Author(s)

Maintainer: Florian Privé florian.prive.21@gmail.com

Other contributors:

Michael Blum [thesis advisor]
Hugues Aschard hugues.aschard@pasteur.fr [thesis advisor]

bigstatsr: Statistical Tools for Filebacked Big Matrices

Description

Arguments

Matrix parallelization

Author(s)

See Also