lca.model {LCAextend} | R Documentation |
fits latent class models for phenotypic measurements in pedigrees with or without familial dependence using an Expectation-Maximization (EM) algorithm
Description
This is the main function for fitting latent class models. It performs some checks of the pedigrees (it exits if an individual has only one
parent in the pedigree, if no children is in the pedigree or if there
are not enough individuals for parameters estimation) and of the
initial values (positivity of probabilites and their summation to
one). For models with familial dependence, the child latent class
depends on his parents classes via
triplet-transition probabilities. In the case of models without
familial dependence, it performs the classical Latent
Class Analysis (LCA) where all individuals are supposed independent
and the pedigree structure is meaningless. The EM algorithm stops when
the difference between log-likelihood is smaller then tol
that
is fixed by the user.
Usage
lca.model(ped, probs, param, optim.param, fit = TRUE,
optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), tol = 0.001,
x = NULL, var.list = NULL, famdep = TRUE, modify.init = NULL)
Arguments
ped |
a matrix or data frame representing pedigrees and measurements: |
probs |
a list of initial probability parameters (see below
for more details). The function |
param |
a list of initial measurement distribution parameters (see below for more details). The function |
optim.param |
a variable indicating how measurement distribution parameter optimization is performed (see below for more details), |
fit |
a logical variable, if |
optim.probs.indic |
a vector of logical values indicating which probability parameters to estimate, |
tol |
a small number governing the stopping rule of the EM algorithm. Default is 0.001, |
x |
a matrix of covariates (optional), default is |
var.list |
a list of integers indicating the columns of
|
famdep |
a logical variable indicating if familial dependence model is used or not. Default is |
modify.init |
a function to modify initial values of the EM algorithm, or |
Details
The symptom status vector (column 6 of ped
) takes value 1 for
subjects that have been
examined and show no symptoms (i.e. completely unaffected
subjects). When applying the LCA to
measurements available on all subjects, the status vector must take the
value of 2 for every individual with measurements.
probs
is a list of initial probability parameters:
For models with familial dependence:
p
a probability vector, each
p[c]
is the probability that an symptomatic founder is in classc
forc>=1
,p0
the probability that a founder without symptoms is in class 0,
p.trans
an array of dimension
K
timesK+1
timesK+1
, whereK
is the number of latent classes of the model, and is such thatp.trans[c_i,c_1,c_2]
is the conditional probability that a symptomatic individuali
is in classc_i
given that his parents are in classesc_1
andc_2
,p0connect
a vector of length
K
, wherep0connect[c]
is the probability that a connector without symptoms is in class0
, given that one of his parents is in classc>=1
and the other in class 0,p.found
the probability that a founder is symptomatic,
p.child
the probability that a child is symptomatic,
For models without familial dependence, all individuals are independent:
p
a probability vector, each
p[c]
is the probability that an symptomatic individual is in classc
forc>=1
,p0
the probability that an individual without symptoms is in class 0,
p.aff
the probability that an individual is symptomatic,
param
is a list of measurement distribution parameters: the coefficients alpha
(cumulative logistic coefficients see alpha.compute
) in
the case of discrete or ordinal data, and means mu
and variances-covariances matrices sigma
in the case of continuous data,
optim.param
is a variable indicating how the measurement distribution parameter estimation of the M step is performed. Two possibilities,
optim.noconst.ordi
and optim.const.ordi
, are now available in the case of discrete or ordinal measurements, and four possibilities
optim.indep.norm
(measurements are independent, diagonal variance-covariance matrix),
optim.diff.norm
(general variance-covariance matrix but equal for all classes),
optim.equal.norm
(variance-covariance matrices are different for each class but equal variance and equal covariance for a class) and
optim.gene.norm
(general variance-covariance matrices for all classes), are now available in the case of continuous measurements,
One of the allowed values of optim.param
must be entered without quotes.
optim.probs.indic
is a vector of logical values of length 4 for
models with familial dependence and 2 for models without familial
dependence.
For models with familial dependence:
optim.probs.indic[1]
indicates whether
p0
will be estimated or not,optim.probs.indic[2]
indicates whether
p0connect
will be estimated or not,optim.probs.indic[3]
indicates whether
p.found
will be estimated or not,optim.probs.indic[4]
indicates whether
p.connect
will be estimated or not.
For models without familial dependence:
optim.probs.indic[1]
indicates whether
p0
will be estimated or not,optim.probs.indic[2]
indicates whether
p.aff
will be estimated or not.
All defaults are TRUE
. If the dataset contains only nuclear families, there is no information to estimate p0connect and p.connect, and these parameters will not be estimated, irrespective of the indicator value.
Value
The function returns a list of 4 elements:
param |
the Maximum Likelihood Estimator (MLE) of the
measurement distribution parameters if |
probs |
the MLE of probability parameters if |
When measurements are available on all subjects, the probability parameters p0
and p0connect
are degenerated to 0 and
p.afound
, p.child
and p.aff
to 1 in the output.
weight |
an array of dimension |
ll |
the maximum log-likelihood value (log-ML) if |
References
TAYEB, A. LABBE, A., BUREAU, A. and MERETTE, C. (2011) Solving Genetic Heterogeneity in Extended
Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 26(3): 539-560. DOI: 10.1007/s00180-010-0224-2,
LABBE, A., BUREAU, A. et MERETTE, C. (2009) Integration of Genetic Familial Dependence Structure in Latent Class Models. The International Journal of Biostatistics, 5(1): Article 6.
Examples
#data
data(ped.ordi)
fam <- ped.ordi[,1]
#probs and param
data(param.ordi)
data(probs)
#the function applied only to two first families of ped.ordi
lca.model(ped.ordi[fam%in%1:2,],probs,param.ordi,optim.noconst.ordi,
fit=TRUE,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),tol=0.001,x=NULL,
var.list=NULL,famdep=TRUE,modify.init=NULL)