simfam2 {FamEvent} | R Documentation |
Generate familial time-to-event data with Kinship or IBD matrices.
Description
Generate familial time-to-event data from correlated fraily model with Kinship or/and IBD matrices given pedigree data.
Usage
simfam2(inputdata = NULL, IBD = NULL, design = "pop", variation = "none", depend = NULL,
base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"),
vbeta = c(1, 1), agemin = 20, hr = NULL)
Arguments
inputdata |
Dataframe contains variables |
IBD |
IBD matrix |
design |
Family based study design used in the simulations. Possible choices are: |
variation |
Source of residual familial correlation. Possible choices are: |
depend |
Inverse of variance for the frailty distribution. A single value should be specified when |
base.dist |
Choice of baseline hazard distribution. Possible choices are: |
base.parms |
Vector of parameter values for the specified baseline hazard function. |
var_names |
Names of variables to be used in generating time-to-event data. Specified variables should be part of |
vbeta |
Vector of regression coefficients for the variables specified by |
hr |
Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when |
agemin |
Minimum age of disease onset or minimum age. Default is 20 years of age. |
Details
The ages at onset are generated from the correlated frailties and covariates using the following model:
The correlated shared frailty model with kinship and/or IBD matrices
h(t|X,Z) = h0(t - t0) Z exp( X*vbeta ),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, represents a vector of frailties following a multivariate log-normal distribution with mean
and variance
, where
represents the kinship matrix and D is IBD matrix,
and
are variance components related to each matrix and their values are specified by
depend = c(1/sig1, 1/sig2)
, and represents a vector of variables whose names are specified by
var_names
, and \beta is a vector of corresponding coefficients whose values are specified by vbeta
.
The variance structure of the frailties shared within families is chosen by either variation = "kinship"
or "IBD"
matrix or both variation = c("kinship", "IBD")
.
When variation = "none"
, the ages at onset are independently generated from the proportional hazard model conditional on the covariates .
The design
argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop"
when proband is affected, "pop+"
when proband is affected mutation carrier, "cli"
when proband is affected and at least one parent and one sibling are affected, "cli+"
when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage"
, is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr
. High risk families often include multiple (at least two) affected members in the family. design = "noasc"
is to be used for no ascertainment correction.
Value
Returns an object of class 'simfam'
, a data frame which contains inputdata
and the following:
ageonset |
Ages at disease onset in years. |
time |
Ages at disease onset for the affected or ages of last follow-up for the unaffected. |
status |
Disease statuses: 1 for affected, 0 for unaffected (censored). |
fsize |
Family size including parents, siblings and children of the proband and the siblings. |
naff |
Number of affected members in family. |
weight |
Sampling weights. |
References
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.
Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.
See Also
summary.simfam2, plot.simfam, penplot
Examples
## Example: simulate family data from a population-based design using
# a Weibull distribution for the baseline hazard and inducing
# residual familial correlation through kinship and IBD matrices.
# Inputdata and IBD matrix should be provided;
# simuated inputdata as an example here;
data <- simfam(N.fam = 10, design = "noasc", variation = "none",
base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1))
IBDmatrix <- diag(1, dim(data)[1])
data <- data[ , c(1:7, 11, 14)]
fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop",
variation = c("kinship","IBD"), depend = c(1, 1),
base.dist = "Weibull", base.parms = c(0.016, 3),
var_names = c("gender", "mgene"), vbeta = c(1,1),
agemin=20)
head(fam2)
summary(fam2)