progenyClust {progenyClust} | R Documentation |
Progeny Clustering
Description
Select the optimal number for clustering using Progeny Clustering.
Usage
progenyClust(data, FUNclust = kmeans, method = "gap", score.invert = F, ncluster = 2:10,
size = 10, iteration = 100, repeats = 1, nrandom = 10, ...)
## S3 method for class 'progenyClust'
summary(object,...)
Arguments
data |
data matrix or data frame for clustering: each row correpsonds to a sample or observation, whereas each column corresponds to a feature or variable. |
FUNclust |
clustering function: accepts data as its first argument and the number for clustering as the second argument; returns a list containing a component called 'cluster' which is a vector of integers recording the clustering assignment for all samples. The default function is kmeans. |
method |
character string indicating the criterion used to pick the optimal cluster number. |
score.invert |
logical flag: specifies whether the score should be inverted. The default score is the ratio of true classification probabilities over false classification probilities. The inverted score is the ratio of false classification over true classification probilities, which can prevent the algorithm from generating infinite score values in cases of perfect clustering. When score.invert=TRUE, the optimla cluster number is picked based on the lowest score. |
ncluster |
sequence of integers specifying candidate cluster numbers for evaluation: ncluster needs to be continuous if the method 'gap' is chosen. |
size |
integer specifying the number of progenies generated from each cluster. Default value is 10. |
iteration |
integer specifying the number of times the algorithm samples progenies and evalutes similarity among progenies. Default value is 100. |
repeats |
integer specifying the number of times the algorithm should be run: needs to be greater than 0. Values greater than 1 output standard deviations of the scores, which are plotted as error bars in print(...,errorbar=T,...) function. Default value is 1. |
nrandom |
integer specifying the number of random datasets used to generate reference scores when using method 'score'. Default value is 10. |
object |
the S3 object of class "progenyClust". |
... |
additional arguments for FUNclust in progenyClust(...). |
Value
progenyClust returns an object of class "progenyClust" which has a plot and summary method. It is a list with the following components:
cluster |
matrix of clustering memberships for all samples under given cluster numbers: each row corresponds to a sample; each column corresponds to a given cluster number. |
score |
matrix of stability scores from clustering the input data under given cluster numbers: each column corresponds to a given cluster number; each row corresponds to a repeat, the number of which is defined by 'repeats' in the input argument. |
random.score |
matrix of stability scores from clustering random datasets under given cluster numbers: each column corresponds to a given cluster number; each row corresponds to a random dataset, the number of which is defined by 'nrandom' in the input argument. |
random.score |
matrix of stability scores from clustering random datasets under given cluster numbers: each column corresponds to a given cluster number; each row corresponds to a random dataset, the number of which is defined by 'nrandom' in the input argument. |
mean.gap |
vector of mean stability scores based on the 'gap' criterion when the input argument 'method' is set to be 'gap' or 'both'. |
mean.score |
vector of mean stability scores based on the 'score' criterion when the input argument 'method' is set to be 'score' or 'both'. |
sd.gap |
vector of standard deviations of stability scores for each given cluster number based on the 'gap' criterion, when the input argument 'method' is set to be 'gap' or 'both'. |
sd.score |
vector of standard deviations of stability scores for each given cluster number based on the 'score' criterion, when the input argument 'method' is set to be 'score' or 'both'. |
call |
the call with arguments specified. |
ncluster |
the specified value of input argument 'ncluster'. |
method |
the specified value of input argument 'method'. |
score.invert |
the specified value of input argument 'score.invert'. |
Author(s)
C.W. Hu, Rice University
References
Hu, C.W., et al. "Progeny Clustering: A Method to Identify Biological Phenotypes." Scientific reports 5 (2015).
http://www.nature.com/articles/srep12894
Examples
# a 3-cluster 2-dimensional example dataset
data('test')
# default progeny clsutering
progenyClust(test,ncluster=2:5)->pc
summary(pc)
plot(pc)