R: A wrapper for the hierarchical sparse clustering algorithm

HierarchicalSparseCluster.wrapper {sparcl}

R Documentation

A wrapper for the hierarchical sparse clustering algorithm

Description

A wrapper for HierarchicalSparseCluster which reads in the data in GCT file format, and then automatically chooses the optimal tuning parameter value using HierarchicalSparseCluster.permute if not specified.

Usage

HierarchicalSparseCluster.wrapper(file,  method=c("average", "complete", "single",
 "centroid"),
wbound=NULL, silent=FALSE, cluster.features=FALSE,
method.features=c("average", "complete",
"single","centroid"),output.cluster.files=TRUE,outputfile.prefix=NULL,maxnumgenes=5000,
standardize.arrays=TRUE)

Arguments

`file`	A GCT filename in the working directory containing the data to be clustered.
`method`	The type of linkage to use in the hierarchical clustering - "single", "complete", "average", or "centroid".
`wbound`	The L1 bound on w to use; this is the tuning parameter for sparse hierarchical clustering. If NULL, then it will be chosen via HierarchicalSparseCluster.permute.
`silent`	Print out progress?
`cluster.features`	Is a clustering for the features with non-zero weights also desired? Default is FALSE.
`method.features`	If cluster.features is TRUE, then the type of linkage used to cluster the features with non-zero weights: one of "single", "complete", "average", or "centroid".
`output.cluster.files`	Should files containing the clustering be output? Default is TRUE.
`outputfile.prefix`	The prefix for the output files. If NULL, then the prefix of the input file is used.
`maxnumgenes`	Limit the analysis to some number of genes with highest marginal variance, for computational reasons. This is recommended when the number of genes is very large. If NULL, then all genes are used.
`standardize.arrays`	Should the arrays first be standardized? Default is TRUE.

Value

`hc`	The output of a call to "hclust", giving the results of hierarchical sparse clustering.
`ws`	The p-vector of feature weights.
`u`	The nxn dissimilarity matrix passed into hclust, of the form $(sum_j w_j d_ii'j)_ii'$.
`dists`	The (n*n)xp dissimilarity matrix for the data matrix x. This is useful if additional calls to HierarchicalSparseCluster will be made.

Author(s)

Daniela M. Witten and Robert Tibshirani

References

Witten and Tibshirani (2009) A framework for feature selection in clustering.