R: Run PHATE on an input data matrix

phate {phateR}

R Documentation

Run PHATE on an input data matrix

Description

PHATE is a data reduction method specifically designed for visualizing high dimensional data in low dimensional spaces.

Usage

phate(
  data,
  ndim = 2,
  knn = 5,
  decay = 40,
  n.landmark = 2000,
  gamma = 1,
  t = "auto",
  mds.solver = "sgd",
  knn.dist.method = "euclidean",
  knn.max = NULL,
  init = NULL,
  mds.method = "metric",
  mds.dist.method = "euclidean",
  t.max = 100,
  npca = 100,
  plot.optimal.t = FALSE,
  verbose = 1,
  n.jobs = 1,
  seed = NULL,
  potential.method = NULL,
  k = NULL,
  alpha = NULL,
  use.alpha = NULL,
  ...
)

Arguments

`data`	matrix (n_samples, n_dimensions) 2 dimensional input data array with n_samples samples and n_dimensions dimensions. If `knn.dist.method` is 'precomputed', `data` is treated as a (n_samples, n_samples) distance or affinity matrix
`ndim`	int, optional, default: 2 number of dimensions in which the data will be embedded
`knn`	int, optional, default: 5 number of nearest neighbors on which to build kernel
`decay`	int, optional, default: 40 sets decay rate of kernel tails. If NULL, alpha decaying kernel is not used
`n.landmark`	int, optional, default: 2000 number of landmarks to use in fast PHATE
`gamma`	float, optional, default: 1 Informational distance constant between -1 and 1. `gamma=1` gives the PHATE log potential, `gamma=0` gives a square root potential.
`t`	int, optional, default: 'auto' power to which the diffusion operator is powered sets the level of diffusion
`mds.solver`	'sgd', 'smacof', optional, default: 'sgd' which solver to use for metric MDS. SGD is substantially faster, but produces slightly less optimal results. Note that SMACOF was used for all figures in the PHATE paper.
`knn.dist.method`	string, optional, default: 'euclidean'. recommended values: 'euclidean', 'cosine', 'precomputed' Any metric from `scipy.spatial.distance` can be used distance metric for building kNN graph. If 'precomputed', `data` should be an n_samples x n_samples distance or affinity matrix. Distance matrices are assumed to have zeros down the diagonal, while affinity matrices are assumed to have non-zero values down the diagonal. This is detected automatically using `data[0,0]`. You can override this detection with `knn.dist.method='precomputed_distance'` or `knn.dist.method='precomputed_affinity'`.
`knn.max`	int, optional, default: NULL Maximum number of neighbors for which alpha decaying kernel is computed for each point. For very large datasets, setting `knn.max` to a small multiple of `knn` can speed up computation significantly.
`init`	phate object, optional object to use for initialization. Avoids recomputing intermediate steps if parameters are the same.
`mds.method`	string, optional, default: 'metric' choose from 'classic', 'metric', and 'nonmetric' which MDS algorithm is used for dimensionality reduction
`mds.dist.method`	string, optional, default: 'euclidean' recommended values: 'euclidean' and 'cosine'
`t.max`	int, optional, default: 100. Maximum value of t to test for automatic t selection.
`npca`	int, optional, default: 100 Number of principal components to use for calculating neighborhoods. For extremely large datasets, using n_pca < 20 allows neighborhoods to be calculated in log(n_samples) time.
`plot.optimal.t`	boolean, optional, default: FALSE If TRUE, produce a plot showing the Von Neumann Entropy curve for automatic t selection.
`verbose`	`int` or `boolean`, optional (default : 1) If `TRUE` or `⁠> 0⁠`, print verbose updates.
`n.jobs`	`int`, optional (default: 1) The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n.cpus + 1 + n.jobs) are used. Thus for n_jobs = -2, all CPUs but one are used
`seed`	int or `NULL`, random state (default: `NULL`)
`potential.method`	Deprecated. For log potential, use `gamma=1`. For sqrt potential, use `gamma=0`.
`k`	Deprecated. Use `knn`.
`alpha`	Deprecated. Use `decay`.
`use.alpha`	Deprecated To disable alpha decay, use `alpha=NULL`
`...`	Additional arguments for `graphtools.Graph`.

Value

"phate" object containing:

embedding: the PHATE embedding
operator: The PHATE operator (python phate.PHATE object)
params: Parameters passed to phate

Examples

if (reticulate::py_module_available("phate")) {

# Load data
# data(tree.data)
# We use a smaller tree to make examples run faster
data(tree.data.small)

# Run PHATE
phate.tree <- phate(tree.data.small$data)
summary(phate.tree)
## PHATE embedding
## knn = 5, decay = 40, t = 58
## Data: (3000, 100)
## Embedding: (3000, 2)

library(graphics)
# Plot the result with base graphics
plot(phate.tree, col=tree.data.small$branches)
# Plot the result with ggplot2
if (require(ggplot2)) {
  ggplot(phate.tree) +
    geom_point(aes(x=PHATE1, y=PHATE2, color=tree.data.small$branches))
}

# Run PHATE again with different parameters
# We use the last run as initialization
phate.tree2 <- phate(tree.data.small$data, t=150, init=phate.tree)
# Extract the embedding matrix to use in downstream analysis
embedding <- as.matrix(phate.tree2)

}

[Package phateR version 1.0.7 Index]