lca.model {LCAextend}R Documentation

fits latent class models for phenotypic measurements in pedigrees with or without familial dependence using an Expectation-Maximization (EM) algorithm

Description

This is the main function for fitting latent class models. It performs some checks of the pedigrees (it exits if an individual has only one parent in the pedigree, if no children is in the pedigree or if there are not enough individuals for parameters estimation) and of the initial values (positivity of probabilites and their summation to one). For models with familial dependence, the child latent class depends on his parents classes via triplet-transition probabilities. In the case of models without familial dependence, it performs the classical Latent Class Analysis (LCA) where all individuals are supposed independent and the pedigree structure is meaningless. The EM algorithm stops when the difference between log-likelihood is smaller then tol that is fixed by the user.

Usage

lca.model(ped, probs, param, optim.param, fit = TRUE, 
optim.probs.indic = c(TRUE, TRUE, TRUE, TRUE), tol = 0.001, 
x = NULL, var.list = NULL, famdep = TRUE, modify.init = NULL)

Arguments

ped

a matrix or data frame representing pedigrees and measurements: ped[,1] family ID, ped[,2] subjects ID, ped[,3] dad ID, ped[,4] mom ID, ped[,5] sex, ped[,6] symptom status (2: symptomatic, 1: without symptoms, 0: missing), ped[,7:ncol(ped)] measurements, each column corresponds to a phenotypic measurement. If the measurement distribution specified with optim.param is multinomial, then these columns must either be of type integer of factor,

probs

a list of initial probability parameters (see below for more details). The function init.p.trans can be used to compute an initial value of the component p.trans of probs,

param

a list of initial measurement distribution parameters (see below for more details). The function init.ordi can be used to compute an initial value of param in the case of discrete or ordinal data (product multinomial distribution) and init.norm in the case of continous data (mutivariate normal distribution),

optim.param

a variable indicating how measurement distribution parameter optimization is performed (see below for more details),

fit

a logical variable, if TRUE, the EM algorithm is performed, if FALSE, only computation of weights and log-likelihood are performed with the initial parameter values without log-likelihood maximization,

optim.probs.indic

a vector of logical values indicating which probability parameters to estimate,

tol

a small number governing the stopping rule of the EM algorithm. Default is 0.001,

x

a matrix of covariates (optional), default is NULL,

var.list

a list of integers indicating the columns of x containing the covariates to use for a given phenotypic measurement, default is NULL,

famdep

a logical variable indicating if familial dependence model is used or not. Default is TRUE. In models without familial dependence, individuals are treated as independent and pedigree structure is meaningless. In models with familial dependence, a child class depends in his parents classes via a triplet-transition probability,

modify.init

a function to modify initial values of the EM algorithm, or NULL, default is NULL.

Details

The symptom status vector (column 6 of ped) takes value 1 for subjects that have been examined and show no symptoms (i.e. completely unaffected subjects). When applying the LCA to measurements available on all subjects, the status vector must take the value of 2 for every individual with measurements.

probs is a list of initial probability parameters:

For models with familial dependence:

p

a probability vector, each p[c] is the probability that an symptomatic founder is in class c for c>=1,

p0

the probability that a founder without symptoms is in class 0,

p.trans

an array of dimension K times K+1 times K+1, where K is the number of latent classes of the model, and is such that p.trans[c_i,c_1,c_2] is the conditional probability that a symptomatic individual i is in class c_i given that his parents are in classes c_1 and c_2,

p0connect

a vector of length K, where p0connect[c] is the probability that a connector without symptoms is in class 0, given that one of his parents is in class c>=1 and the other in class 0,

p.found

the probability that a founder is symptomatic,

p.child

the probability that a child is symptomatic,

For models without familial dependence, all individuals are independent:

p

a probability vector, each p[c] is the probability that an symptomatic individual is in class c for c>=1,

p0

the probability that an individual without symptoms is in class 0,

p.aff

the probability that an individual is symptomatic,

param is a list of measurement distribution parameters: the coefficients alpha (cumulative logistic coefficients see alpha.compute) in the case of discrete or ordinal data, and means mu and variances-covariances matrices sigma in the case of continuous data,

optim.param is a variable indicating how the measurement distribution parameter estimation of the M step is performed. Two possibilities, optim.noconst.ordi and optim.const.ordi, are now available in the case of discrete or ordinal measurements, and four possibilities optim.indep.norm (measurements are independent, diagonal variance-covariance matrix), optim.diff.norm (general variance-covariance matrix but equal for all classes), optim.equal.norm (variance-covariance matrices are different for each class but equal variance and equal covariance for a class) and optim.gene.norm (general variance-covariance matrices for all classes), are now available in the case of continuous measurements, One of the allowed values of optim.param must be entered without quotes.

optim.probs.indic is a vector of logical values of length 4 for models with familial dependence and 2 for models without familial dependence.

For models with familial dependence:

optim.probs.indic[1]

indicates whether p0 will be estimated or not,

optim.probs.indic[2]

indicates whether p0connect will be estimated or not,

optim.probs.indic[3]

indicates whether p.found will be estimated or not,

optim.probs.indic[4]

indicates whether p.connect will be estimated or not.

For models without familial dependence:

optim.probs.indic[1]

indicates whether p0 will be estimated or not,

optim.probs.indic[2]

indicates whether p.aff will be estimated or not.

All defaults are TRUE. If the dataset contains only nuclear families, there is no information to estimate p0connect and p.connect, and these parameters will not be estimated, irrespective of the indicator value.

Value

The function returns a list of 4 elements:

param

the Maximum Likelihood Estimator (MLE) of the measurement distribution parameters if fit=TRUE or the input param if fit=FALSE,

probs

the MLE of probability parameters if fit=TRUE or the input probs if fit=FALSE,

When measurements are available on all subjects, the probability parameters p0 and p0connect are degenerated to 0 and p.afound, p.child and p.aff to 1 in the output.

weight

an array of dimension n (the number of individuals) times 2 times K+1 (K being the number of latent classes in the selected model and the K+1th class being the unaffected class) giving the individual posterior probabilities. weight[i,s,c] is the posterior probability that individual i belongs to class c when his symptom status is s, where s takes two values: 1 for symptomatic and 2 for without symptom. In particular, all weight[,2,] are 0 for symptomatic individuals and all weight[,1,] are 0 for individuals without symptoms. For missing individuals (unkown symptom status), both weight[,1,] and weight[,2,] may be greater than 0.

ll

the maximum log-likelihood value (log-ML) if fit=TRUE or the log-likelihood computed with the input values of param and probs if fit=FALSE,

References

TAYEB, A. LABBE, A., BUREAU, A. and MERETTE, C. (2011) Solving Genetic Heterogeneity in Extended Families by Identifying Sub-types of Complex Diseases. Computational Statistics, 26(3): 539-560. DOI: 10.1007/s00180-010-0224-2,

LABBE, A., BUREAU, A. et MERETTE, C. (2009) Integration of Genetic Familial Dependence Structure in Latent Class Models. The International Journal of Biostatistics, 5(1): Article 6.

Examples

#data
data(ped.ordi)
fam <- ped.ordi[,1]
#probs and param
data(param.ordi)
data(probs)
#the function applied only to two first families of ped.ordi
lca.model(ped.ordi[fam%in%1:2,],probs,param.ordi,optim.noconst.ordi,
          fit=TRUE,optim.probs.indic=c(TRUE,TRUE,TRUE,TRUE),tol=0.001,x=NULL,
          var.list=NULL,famdep=TRUE,modify.init=NULL)

[Package LCAextend version 1.3 Index]