bigdist {bigdist}R Documentation

Read or Create a distance matrix on disk

Description

Computes distances via dist and saves then as file-backed matrix(FBM) using bigstatsr package or connects existing FBM backup file on disk.

Usage

bigdist(mat, file, method = "euclidean", type = "float")

Arguments

mat

Numeric matrix. When missing, attempts to connect to existing backup file. See 'file' argument.

file

(string) Name of the backing file to be created or an existing backup file. Do not include trailing ".bk". See details for the backup file format.

method

(string or function) See method argument of dist. This ignored when mat is missing.

type

(string, default: 'float') Storage type of FBM. See FBM. This ignored when mat is missing.

Details

bigdist class is a list where the key 'fbm' holds the FBM connection. The filename format is of the form <somename>_<size>_<type>.bk where size is the number of observations and type is the data type like 'double', 'float'.

bigstatsr package stores matrices on disk and allows efficient computation on them. The disto provides a unified frontend to read parts of distance matrices and apply functions over rows/columns. For efficient operations, write C++ functions to talk to bigstatsr's FBM.

The distance computation and writing to FBM may be parallelized by setting a future backend

Value

An object of class 'bigdist'.

Examples

# basics of 'bigdist'
# create a random matrix
set.seed(1)
amat <- matrix(rnorm(1e3), ncol = 10)
td   <- tempdir()

# create a bigdist object with FBM (file-backed matrix) on disk
temp <- bigdist(mat = amat, file = file.path(td, "temp_ex1"))
temp
temp$fbm$backingfile
temp$fbm[1, 2]

# connect to FBM on disk as a bigdist object
temp2 <- bigdist(file = file.path(td, "temp_ex1_100_float"))
temp2
temp2$fbm[1,2]

# check the size of bigdist object
bigdist_size(temp)

# bigdist accessors

# ij
bigdist_extract(temp, 1, 2)
bigdist_extract(temp, 1:2, 3:4)
bigdist_extract(temp, 1:2, 3:4, product = "inner")
dim(bigdist_extract(temp, 1:2,))
dim(bigdist_extract(temp, , 3:4))

# k (lower trianle indexing)
bigdist_extract(temp, k = 3:7)

# bigdist replacers

# ij
bigdist_replace(temp, 1, 2, 10)
bigdist_extract(temp, 1, 2)
bigdist_replace(temp, 1:2, 3:4, 11:12)
bigdist_extract(temp, 1:2, 3:4, product = "inner")

# k (lower trianle indexing)
bigdist_replace(temp, k = 3:7, value = 51:55)
bigdist_extract(temp, k = 3:7)

# subset a bigdist object
temp_subset <- bigdist_subset(temp, index = 21:30, file = file.path(td, "temp_ex2"))
temp_subset
temp_subset$fbm$backingfile

# convert a dist object(in memory) to a bigdist object
temp3 <- as_bigdist(dist(mtcars), file = file.path(td, "temp_ex3"))
temp3

[Package bigdist version 0.1.4 Index]