LDPC {RPointCloud}R Documentation

Local Dimension of Point Clouds

Description

Given a data set viewed as a point cloud in N-dimesnional space, compute the local estimate of the dimension of an underlying manifold, as defined by Ellis and McDermott, analogous to earlier related work by Takens.

Usage

takens(r, dists)
LDPC(CellID, dset, rg, quorum, samplesAreRows = TRUE)

Arguments

CellID

An integer indexing one of the cells-samples in the data set.

dset

A data set. The usual orientation is that rows are cells and columns are features.

rg

A numerical vector of radial distances at which to compute Takens estimates of the local dimension.

quorum

The minimum number of neighboring cells required for the computation to be meaningful.

samplesAreRows

A logical value: do rows or columns represent samples at which to compute local dimensions.

r

A radial distances at which to compute the Takens estimate.

dists

A sorted vector of distances from one cell to all other cells.

Details

"[T]he procedure is carried out as follows. A 'bin increment', $A$; a number, $m$, of bin increments; and a 'quorum', $q > 0$, are chosen and raw dimensions are calculated for $r=A$, $2A$, $...$, $m$. Next, for each observation, $x_i$, let $r_i$ be the smallest multiple of $A$ not exceeding $mA$ such that the ball with radius $r_i$ centered at $x_i$ contains at least $q$ observations, providing that there are at least $q$ observations within $mA$ of $x_i$. Otherwise let $r_i = mA$."

The takens function computes the Takens estimate of the local dimension of a point cloud at radius $r$ around a data point. For each cell-sample, we must compute and sort the distances from that cell to all other cells. (For the takens function, these distances are passed in as the second argument to the function.) Preliminary histograms of distance distributions may be used to inform a good set of radial distances. Note that the local dimension estimates are infinite if the radius is so small that there are no neighbors. The estmiates decrease as the radius increases os as the number of local neighbors increases. The reference paper by Ellis and McDermott says:

The LDPC function iterates over all cells-samples int he data sets, computes and sorts their distance to all other cells, and invokes teh takens function to compute local estimates os dimension.

Value

The takens function returns a list with two items: the number of neighbors $k$ and the dimension estimate $d$ at each value of the radius from the input vector.

The LDPC function returns a list containing vectors $R$, $k$, and $d$ values for each cell in the data set.

Author(s)

Kevin R. Coombes <krc@silicovore.com>

References

Ellis and McDermott

Examples

data(cytof)
localdim <- LDPC(1, AML10.node287, seq(1, 6, length=20), 30, TRUE)

[Package RPointCloud version 0.6.2 Index]