penmodelEM {FamEvent} | R Documentation |
EM algorithm for estimating the penetrance model with missing genotypes
Description
Fits a penetrance model for family data with missing genotypes via the EM algorithm and provides model parameter estimates.
Usage
penmodelEM(formula, cluster = "famID", gvar = "mgene", parms, cuts = NULL, data,
design = "pop", base.dist = "Weibull", agemin = NULL, robust = FALSE, method = "data",
mode = "dominant", q = 0.02)
Arguments
formula |
A formula expression as for other regression models. The response should be a survival object as returned by the |
cluster |
Name of cluster variable. Default is |
gvar |
Name of genetic variable. Default is |
parms |
Vector of initial values for the parameters in the model including baseline parameters and regression coefficients. |
cuts |
Vector of cuts that define the intervals where the hazard function is constant. The |
data |
Data frame generated from |
design |
Study design of the family data. Possible choices are: |
base.dist |
Choice of baseline hazard distributions to fit. Possible choices are: |
agemin |
Minimum age of disease onset or minimum age. Default is |
robust |
Logical; if |
method |
Choice of methods for calculating the carrier probabilities for individuals with missing mutation status. Possible choices are If |
mode |
Choice of modes of inheritance for calculating carrier probabilies for individuals with missing mutation status. Possible choices are |
q |
Frequency of the disease causing allele used for calculating carrier pobabilities. The value should be between 0 and 1. If |
Details
The expectation and maximization (EM) algorithm is applied for making inference about the missing genotypes. In the expectation step, for individuals with unknown carrier status, we first compute their carrier probabilities given their family's observed phenotype and genotype information based on current estimates of parameters θ as follows,
wfi = P(Xfi = 1 | Yfi, Xfo),
where Xfi represents the mutation carrier status and Yfi represents the phenotype in terms of age at onset tfi and disease status δfi for individual i, i = 1, ..., nf, in family f, f = 1, ..., n,
and Xfo represents the observed genotypes in family f
.
Then, we obtain the conditional expectation of the log-likelihood function (\ell
) of the complete data given the observed data as a weighted log-likelihood, which has the form
Eθ [ l (θ | Y, Xo) ] = ∑f ∑i l fi (θ | Xfi = 1) * wfi + lfi (θ | Xfi = 0) * ( 1 - wfi ).
In the maximization step, the updated parameter estimates are obtained by maximizing the weighted log likelihood computed in the E-step. These expectation and maximization steps iterate until convergence to obtain the maximum likelihood estimates. See more details in Choi and Briollais (2011) or Choi et al. (2014).
Note that the baseline parameters include lambda
and rho
, which represent the scale and shape parameters, respectively, and eta
, additional parameter to specify for "logBurr"
distribution. For the "lognormal"
baseline distribution, lambda
and rho
represent the location and scale parameters for the normally distributed logarithm, where lambda
can take any real values and rho
> 0. For the other baselinse distributions, lambda
> 0, rho
> 0, and eta
> 0. When a piecewise constant distribution is specified for the baseline hazards, base.dist="piecewise"
, baseparm
should specify the initial interval-constant values, one more than the cut points specified bycuts
.
Transformed baseline parameters are used for estimation; log transformation is applied to both scale and shape parameters (\lambda, \rho
) for "Weibull"
, "loglogistic"
, "Gompertz"
and "gamma"
baselines, to (\lambda, \rho, \eta
) for "logBurr"
and to the piecewise constant parameters for a piecewise
baseline hazard. For "lognormal"
baseline distribution, the log transformation is applied only to \rho
, not to \lambda
, which represents the location parameter for the normally distributed logarithm.
Calculations of penetrance estimates and their standard errors and 95% confidence intervals at given ages can be obtained by penetrance
function via Monte-Carlo simulations of the estimated penetrance model.
Value
Returns an object of class 'penmodel'
, including the following elements:
estimates |
Parameter estimates of transformed baseline parameters and regression coefficients. |
varcov |
Variance-covariance matrix of parameter estimates obtained from the inverse of Hessian matrix. |
varcov.robust |
Robust ‘sandwich’ variance-covariance matrix of parameter estimates when |
se |
Standard errors of parameter estimates obtained from the inverse of Hessian matrix. |
se.robust |
Robust ‘sandwich’ standard errors of parameter estimates when |
logLik |
Loglikelihood value for the fitted penetrance model. |
AIC |
Akaike information criterion (AIC) value of the model; AIC = 2*k - 2*logLik, where k is the number of parameters used in the model. |
Author(s)
Yun-Hee Choi
References
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
Choi, Y.-H. and Briollais, L. (2011) An EM composite likelihood approach for multistage sampling of family data with missing genetic covariates, Statistica Sinica 21, 231-253.
Choi, Y.-H., Briollais, L., Green, J., Parfrey, P., and Kopciuk, K. (2014) Estimating successive cancer risks in Lynch Syndrome families using a progressive three-state model, Statistics in Medicine 33, 618-638.
See Also
simfam
, penmodel
, print.penmodel
, summary.penmodel
,
print.summary.penmodel
, plot.penmodel
, carrierprob
Examples
# Family data simulated with 20% of members missing their genetic information.
set.seed(4321)
fam <- simfam(N.fam = 100, design = "pop+", base.dist = "Weibull", base.parms = c(0.01, 3),
vbeta = c(1, 2), agemin = 20, allelefreq = 0.02, mrate = 0.2)
# EM algorithm for fitting family data with missing genotypes
fit <- penmodelEM(Surv(time, status) ~ gender + mgene, cluster = "famID", gvar = "mgene",
parms = c(0.01, 3, 1, 2), data = fam, design="pop+", robust = TRUE,
base.dist = "Weibull", method = "mendelian", mode = "dominant", q = 0.02)
# Summary of the model parameter estimates from the model fit by penmodelEM
summary(fit)
# Plot the lifetime penetrance curves from model fit for gender and
# mutation status groups along with their nonparametric penetrance curves
# based on observed data excluding probands.
plot(fit)