simfam_tvc {FamEvent} | R Documentation |
Generate familial time-to-event data with a time-varying covariate
Description
Generates familial time-to-event data with a time-varying covariate for specified study design, genetic model and source of residual familial correlation; the generated data frame also contains family structure (individual's id, father id, mother id, relationship to proband, generation), gender, current age, genotypes of major or second genes.
Usage
simfam_tvc(N.fam, design = "pop", variation = "none", interaction = FALSE,
add.x = FALSE, x.dist = NULL, x.parms = NULL, depend = NULL,
add.tvc = FALSE, tvc.type = "PE", tvc.range = NULL, tvc.parms = 1,
base.dist = "Weibull", frailty.dist = NULL, base.parms = c(0.016, 3),
vbeta = c(1, 1),
allelefreq = 0.02, dominant.m = TRUE, dominant.s = TRUE,
mrate = 0, hr = 0, probandage = c(45, 2), agemin = 20, agemax = 100)
Arguments
N.fam |
Number of families to generate. |
design |
Family based study design used in the simulations. Possible choices are: |
variation |
Source of residual familial correlation. Possible choices are: |
interaction |
Logical; if |
add.x |
Logical; if |
x.dist |
Distribution of the covairate. Possible choices to generate the covariate are: |
x.parms |
Parameter values for the specified distribution of the covariate. |
depend |
Inverse of variance of the frailty distribution. Dependence within families decreases with depend value. Default is |
add.tvc |
Logical; if |
tvc.type |
Choice of time-varying covariate model. Possible choices are: |
tvc.range |
Range of ages at which the time-varying covariate occurs.
Default is |
tvc.parms |
Vector of parameter values used for the time-varying covariate model. Default value is 1. |
base.dist |
Choice of baseline hazard distribution. Possible choices are: |
frailty.dist |
Choice of frailty distribution. Possible choices are: |
base.parms |
Vector of parameter values for the specified baseline hazard function. |
vbeta |
Vector of regression coefficients for gender, majorgene, interaction between gender and majorgene (if |
allelefreq |
Population allele frequencies of major disease gene. Value should be between 0 and 1.
Vector of population allele frequencies of major and second disease genes should be provided when |
dominant.m |
Logical; if |
dominant.s |
Logical; if |
mrate |
Proportion of missing genotypes, value between 0 and 1. Default value is 0. |
hr |
Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when |
probandage |
Vector of mean and standard deviation for the proband age. Default values are mean of 45 years and standard deviation of 2 years, |
agemin |
Minimum age of disease onset or minimum age. Default is 20 years of age. |
agemax |
Maximum age of disease onset or maximum age. Default is 100 years of age. |
Details
Time-varying covariate
When add.tvc = TRUE
, the time at which the time-varying covariate (TVC) occurs, tvc.age
, is generated from a uniform distribution with the range specified by tvc.range
. A vector of minimum and maximum ages for the TVC should be specified in tve.range
. When tvc.range = NULL
, agemin
and agemax
are used as the range. In addition, tvc.type
should be either "PE"
or "CO"
and the parameter values for the specified TVC type should be provided in tvc.parms
.
tvc.type = "PE"
represents a permanent exposure model for TVC which assumes that the effect of the TVC stays constant after tvc.age
. The tvc.parms
for the PE model should be specified as a single value, which represents log hazard ratio.
tvc.type = "CO"
represents the Cox and Oaks model for TVC which assumes that the effect of the TVC decays exponentially over time in the form
β exp(-(t - t*) η) + η0,
where t* is the time at which the TVC occurs.
The tvc.parms
for the CO model should be specified by a vector of three parameters consisting of c(beta, eta, eta0)
.
Family-based study design
The design
argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop"
when proband is affected, "pop+"
when proband is affected mutation carrier, "cli"
when proband is affected and at least one parent and one sibling are affected, "cli+"
when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage"
, is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr
. High risk families often include multiple (at least two) affected members in the family. design = "noasc"
is to be used for no ascertainment correction.
Penetrance model
The ages at onset are generated from the following penetrance models depending on the choice of variation = "none", "frailty", "secondgene", "kinship".
. When variation = "none"
, the ages at onset are independently generated from the proportional hazard model conditional on the gender and carrier status of major gene mutation, X = c(xs, xg).
The ages at onset correlated within families are generated from the shared frailty model (variation = "frailty"
) , the correlated shared frailty model with kinship matrix (variation = "kinship"
), or the two-gene model (variation = "secondene"
), where the residual familial correlation is induced by a frailty or a second gene, respectively, shared within the family.
The proportional hazard model
h(t|X) = h0(t - t0) exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The shared frailty model
h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, Z
represents a frailty shared within families and follows either a gamma or log-normal distribution, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The correlated shared frailty model with kinship matrix
h(t|X,Z) = h0(t - t0) Z exp(βs * xs + βg * xg),
where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, Z
represents a vector of frailties following a multivariate log-normal distribution with mean 0
and variance 2*K*depend
, where K
represents the kinship matrix, xx and xg indicate male (1) or female (0) and carrier (1) or non-carrier (0) of a main gene of interest, respectively.
The two-gene model
h(t|X) = h0(t - t0) Z exp(βs * xs + β1 * x2 + β2 * x2),
where x1, x2 indicate carriers (1) and non-carriers (0) of a major gene and of second gene mutation, respectively.
The current ages for each generation are simulated assuming normal distributions. However, the probands' ages are generated using a left truncated normal distribution as their ages cannot be less than the minimum age of onset. The average age difference between each generation and their parents is specified as 20 years apart.
Value
Returns an object of class 'simfam'
, a data frame which contains:
famID |
Family identification (ID) numbers. | ||||||||||||||
indID |
Individual ID numbers. | ||||||||||||||
gender |
Gender indicators: 1 for males, 0 for females. | ||||||||||||||
motherID |
Mother ID numbers. | ||||||||||||||
fatherID |
Father ID numbers. | ||||||||||||||
proband |
Proband indicators: 1 if the individual is the proband, 0 otherwise. | ||||||||||||||
generation |
Individuals generation: 1=parents of probands,2=probands and siblings, 3=children of probands and siblings. | ||||||||||||||
majorgene |
Genotypes of major gene: 1=AA, 2=Aa, 3=aa where A is disease gene. | ||||||||||||||
secondgene |
Genotypes of second gene: 1=BB, 2=Bb, 3=bb where B is disease gene. | ||||||||||||||
ageonset |
Ages at disease onset in years. | ||||||||||||||
currentage |
Current ages in years. | ||||||||||||||
time |
Ages at disease onset for the affected or ages of last follow-up for the unaffected. | ||||||||||||||
status |
Disease statuses: 1 for affected, 0 for unaffected (censored). | ||||||||||||||
mgene |
Major gene mutation indicators: 1 for mutated gene carriers, 0 for mutated gene noncarriers, or | ||||||||||||||
newx |
Additional covariate when | ||||||||||||||
tvc.age |
Age at which the time-varying covariate occurs when | ||||||||||||||
tvc.status |
TVC status: 1 if | ||||||||||||||
relation |
Family members' relationship with the proband:
| ||||||||||||||
fsize |
Family size including parents, siblings and children of the proband and the siblings. | ||||||||||||||
naff |
Number of affected members in family. | ||||||||||||||
weight |
Sampling weights. |
Author(s)
Yun-Hee Choi
References
Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07
Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.
Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.
See Also
summary.simfam_tvc, plot.simfam_tvc
Examples
## Example: simulate family data with TVC based on CO model.
set.seed(4321)
fam <- simfam_tvc(N.fam = 10, design = "pop", variation = "frailty",
base.dist = "Weibull", frailty.dist = "gamma", depend = 1,
add.tvc = TRUE, tvc.type = "CO", tvc.range = c(30,60),
tvc.parms = c(1, 0.1, 0), allelefreq = 0.02,
base.parms = c(0.01, 3), vbeta = c(-1.13, 2.35))
## Not run:
> head(fam)
famID indID gender motherID fatherID proband generation majorgene secondgene ageonset
1 1 1 1 0 0 0 1 2 0 61.80566
2 1 2 0 0 0 0 1 3 0 61.56996
3 1 3 0 2 1 1 2 2 0 39.42050
4 1 4 1 0 0 0 0 3 0 90.17320
5 1 13 0 3 4 0 3 3 0 51.49538
6 1 14 0 3 4 0 3 3 0 75.97238
currentage time status mgene tvc.age tvc.status relation fsize naff weight
1 68.26812 61.80566 1 1 59.16387 1 4 29 3 1
2 68.60174 61.56996 1 0 39.45786 1 4 29 3 1
3 47.05410 39.42050 1 1 35.01941 1 1 29 3 1
4 44.86501 44.86501 0 0 58.67013 0 6 29 3 1
5 22.73075 22.73075 0 0 30.19254 0 3 29 3 1
6 22.71399 22.71399 0 0 40.66258 0 3 29 3 1
> summary(fam)
Study design: pop: population-based study with affected probands
Baseline distribution: Weibull
Frailty distribution: gamma
Number of families: 10
Average number of affected per family: 3.1
Average number of carriers per family: 3.4
Average family size: 16.3
Average age of onset for affected: 48.19
Average number of TVC event per family: 4
Sampling weights used: 1
## End(Not run)