phyclust {phyclust} | R Documentation |
The Main Function of phyclust
Description
The main function of phyclust implements finite mixture models for sequence data that the mutation processes are modeled by evolution processes based on Continuous Time Markov Chain theory.
Usage
phyclust(X, K, EMC = .EMC, manual.id = NULL, label = NULL, byrow = TRUE)
Arguments
X |
nid/sid matrix with |
K |
number of clusters. |
EMC |
EM control. |
manual.id |
manually input class ids. |
label |
label of sequences for semi-supervised clustering |
byrow |
advanced option for |
Details
X
should be a numerical matrix containing sequence data that
can be transfered by code2nid
or code2sid
.
EMC
contains all options used for EM algorithms.
manual.id
manually input class ids as an initialization only for
the initialization method, 'manualMu'.
label
indicates the known clusters for labeled sequences which is a
vector with length N
and has values from 0
to K
.
0
indicates clusters are unknown. label = NULL
is for
unsupervised clustering. Only un- and semi-supervised clustering are
implemented.
byrow
used in bootstraps to avoid transposing matrix 'X'. If
FALSE, then the 'X' should be have the dimension L\times K
.
Value
A list with class phyclust
will be returned containing
several elements as the following:
'N.X.org' |
number of sequences in the | |||||||||||||||
'N.X.unique' |
number of unique sequences in the | |||||||||||||||
'L' |
number of sites, length of sequences, number of column of the | |||||||||||||||
'K' |
number of clusters. | |||||||||||||||
'Eta' |
proportion of subpopulations, | |||||||||||||||
'Z.normalized' |
posterior probabilities, | |||||||||||||||
'Mu' |
centers of subpopulations, dim = | |||||||||||||||
'QA' |
Q matrix array, information for the evolution model, a list contains:
| |||||||||||||||
'logL' |
log likelihood values. | |||||||||||||||
'p' |
number of parameters. | |||||||||||||||
'bic' |
BIC, | |||||||||||||||
'aic' |
AIC, | |||||||||||||||
'N.seq.site' |
number of segregating sites. | |||||||||||||||
'class.id' |
class id for each sequences based on the maximum posterior. | |||||||||||||||
'n.class' |
number of sequences in each cluster. | |||||||||||||||
'conv' |
convergence information, a list contains:
| |||||||||||||||
'init.procedure' |
initialization procedure. | |||||||||||||||
'init.method' |
initialization method. | |||||||||||||||
'substitution.model' |
substitution model. | |||||||||||||||
'edist.model' |
evolution distance model. | |||||||||||||||
'code.type' |
code type. | |||||||||||||||
'em.method' |
EM algorithm. | |||||||||||||||
'boundary.method' |
boundary method. | |||||||||||||||
'label.method' |
label method. |
ToDo(s)
make a general class for
Q
andQA
.
Author(s)
Wei-Chen Chen wccsnow@gmail.com
References
Phylogenetic Clustering Website: https://snoweye.github.io/phyclust/
See Also
.EMC
,
.EMControl
,
find.best
,
phyclust.se
.
phyclust.se.update
.
Examples
library(phyclust, quiet = TRUE)
X <- seq.data.toy$org
set.seed(1234)
(ret.1 <- phyclust(X, 3))
EMC.2 <- .EMC
EMC.2$substitution.model <- "HKY85"
# the same as EMC.2 <- .EMControl(substitution.model = "HKY85")
(ret.2 <- phyclust(X, 3, EMC = EMC.2))
# for semi-supervised clustering
semi.label <- rep(0, nrow(X))
semi.label[1:3] <- 1
(ret.3 <- phyclust(X, 3, EMC = EMC.2, label = semi.label))