model.select {LCAextend} | R Documentation |
selects a latent class model for pedigree data
Description
Performs selection of a latent class model for phenotypic measurements
in pedigrees based on one of
two possible methods: likelihood-based cross-validation or Bayesian
Information Criterion (BIC) selection. This is the top-level
function to perform a Latent Class Analysis (LCA), which calls the
model fitting function
lca.model
. Model selection is performed among models within one of two
types: with and without familial dependence. Two families of
distributions are currently implemented: product multinomial for discrete (or
ordinal) data and mutivariate
normal for continuous data.
Usage
model.select(ped, distribution, trans.const = TRUE, optim.param,
optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE),
famdep = TRUE, selec = "bic", H = 5, K.vec = 1:7,
tol = 0.001, x = NULL, var.list = NULL)
Arguments
ped |
a matrix containing variables coding the pedigree
structure and the phenotype measurements: |
distribution |
a character variable taking the value |
trans.const |
a logical variable indicating if the parental constraint is used. Parental constraint means that the class of a subject must be one
of his parents classes. Default is |
optim.param |
a variable indicating how the measurement distribution parameter optimization is performed (see below for more details), |
optim.probs.indic |
a vector of logical values indicating which probability parameters to estimate (see below for more details), |
famdep |
a logical variable indicating if the familial dependence model is used or not. Default is |
selec |
a character variables taking the value |
H |
an integer giving the number of equal parts into which data will be splitted for the likelihood-based cross-validation model selection (see below for more details), |
K.vec |
a vector of integers, the number of latent classes of
candidate models, if |
tol |
a small number governing the stopping rule of the EM algorithm. Default is 0.001, |
x |
a matrix of covariates (optional), default is |
var.list |
a list of integers indicating the columns of
|
Details
In the case of cross-validation based-likelihood method, data is
splitted into H
parts: H-1
parts as a training set and one part as a
test set. For each model, a validation log-likelihood is obtained by
evaluating the log-likelihood of the test set data using the parameter
values estimated in the training set. This is repeated H
times
using a different part as training set each time, and a total
validation log-likelihood is obtained by summation over the H
test sets. The best model is the one having the largest
validation log-likelihood. In the case of BIC selection method, the
BIC is computed for each candidate model. The model with the smallest
BIC is selected.
The symptom status vector (column 6 of ped
) takes value 1 for
subjects that have been
examined and show no symptoms (i.e. completely unaffected
subjects). When applying the LCA to
measurements available on all subjects, the status vector must take the
value of 2 for every individual with measurements. If covariates are used, covariate values must be provided for subjects with symptom status 0 (missing) but not for subjects with symptom status 1 (if covariate values are provided, they will be ignored).
optim.param
is a variable indicating how the measurement
distribution parameter optimization of the M step is performed. Two
possibilities,
optim.noconst.ordi
and optim.const.ordi
, are now available in the case of discrete or ordinal measurements, and four possibilities,
optim.indep.norm
(measurements are independent, diagonal variance-covariance matrix),
optim.diff.norm
(general variance-covariance matrix but equal for all classes),
optim.equal.norm
(variance-covariance matrices are different for each class but equal variance and equal covariance for a class) and
optim.gene.norm
(general variance-covariance matrices for all classes), in the case of continuous measurements.
One of the allowed values of optim.param
must be entered without quotes.
optim.probs.indic
is a vector of logical values of length 4 for
models with familial dependence and 2 for models without familial
dependence indicating which probability parameters to estimate. See the
help page for lca.model
for a definition of the parameters.
For models with familial dependence:
optim.probs.indic[1]
indicates whether
p0
will be estimated or not,optim.probs.indic[2]
indicates whether
p0connect
will be estimated or not,optim.probs.indic[3]
indicates whether
p.found
will be estimated or not,optim.probs.indic[4]
indicates whether
p.connect
will be estimated or not.
For models without familial dependence:
optim.probs.indic[1]
indicates whether
p0
will be estimated or not,optim.probs.indic[2]
indicates whether
p.aff
will be estimated or not.
All defaults are TRUE
.
Value
The function returns a list of 5 elements, the first 3 elements are common for BIC and cross-validation model selection methods and are:
param |
the Maximum Likelihood Estimator (MLE) of the measurement distribution parameters of the selected model, |
probs |
the Maximum Likelihood Estimator (MLE) of the probability parameters of the selected model, |
weight |
an array of dimension |
If the cross-validation selection method is used, the function returns also
ll |
the value of the maximum log-likelihood (log-ML) of the selected model, |
ll.valid |
the total cross-validation log-likelihood of all candidate models, |
and if the Bayesian Information Criterion selection method is used, the function returns also
ll |
the value of maximum log-likelihood (log-ML) of all candidate models, |
bic |
the Bayesian Information Criterion
|
References
TAYEB, A. LABBE, A., BUREAU, A. and MERETTE, C. (2011) Solving Genetic Heterogeneity in Extended
Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 26(3): 539-560. DOI: 10.1007/s00180-010-0224-2,
LABBE, A., BUREAU, A. et MERETTE, C. (2009) Integration of Genetic Familial Dependence Structure in Latent Class Models. The International Journal of Biostatistics, 5(1): Article 6.
See Also
See also lca.model
.
Examples
#data
data(ped.cont)
fam <- ped.cont[,1]
#the function applied for the two first families of ped.cont
model.select(ped.cont[fam%in%1:2,],distribution="normal",trans.const=TRUE,
optim.indep.norm,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),
famdep=TRUE,selec="bic",K.vec=1:3,tol=0.001,x=NULL,var.list=NULL)