simfam2 {FamEvent}R Documentation

Generate familial time-to-event data with kinship or IBD matrices.

Description

Generate familial time-to-event data from correlated fraily model with kinship or/and IBD matrices given pedigree data.

Usage

simfam2(inputdata = NULL, IBD = NULL, design = "pop", variation = "none", depend = NULL, 
base.dist = "Weibull", base.parms = c(0.016, 3), var_names = c("gender", "mgene"), 
vbeta = c(1, 1), agemin = 20, hr = NULL)

Arguments

inputdata

Dataframe contains variables famID, indID, gender, motherID, fatherID, proband, generation, currentage and other variables to be used in generating time-to-event data.

IBD

IBD matrix

design

Family based study design used in the simulations. Possible choices are: "pop", "pop+", "cli", "cli+", "twostage", or "noasc", where "pop" is for the population-based design that families are ascertained by affected probands, "pop+" is similar to "pop" but with mutation carrier probands, "cli" is for the clinic-based design that includes affected probands with at least one parent and one sib affected, "cli+" is similar to "cli" but with mutation carrier probands, "twostage" for two-stage design that randomly samples families from the population in the first stage and oversamples high risk families in the second stage that include at least two affected members in the family, and "noasc" for no ascertainment correction that families are from simple random sampling. Default is "pop".

variation

Source of residual familial correlation. Possible choices are: "kinship" for correlated frailties within families generated by kinship matrix, "IBD" for correlated frailties by IBD matrix, c("kinship", "IBD") by both kinship and IBD matrices, or "none" for no residual familial correlation. Default is "none".

depend

Variance component used for the frailty distribution. A single value should be specified when variation = "IBD" or variation = "kinship" or a vector of two values when variation = c("kinship", "IBD"), where the first element corresponds to kinship matrix and the second element corresponds to IBD matrix. Default is NULL.

base.dist

Choice of baseline hazard distribution. Possible choices are: "Weibull", "loglogistic", "Gompertz", "lognormal" "gamma", "logBurr". Default is "Weibull".

base.parms

Vector of parameter values for the specified baseline hazard function. base.parms = c(lambda, rho) should be specified for base.dist = "Weibull", "loglogistic", "Gompertz", "gamma", and "lognormal". For base.dist = "logBurr", three parameters should be specified base.parms = c(lambda, rho, eta). Default value is base.parms = c(0.016, 3) for base.dist = "Weibull".

var_names

Names of variables to be used in generating time-to-event data. Specified variables should be part of inputdata.

vbeta

Vector of regression coefficients for the variables specified by var_names.

hr

Proportion of high risk families, which include at least two affected members, to be sampled from the two stage sampling. This value should be specified when design="twostage". Default value is 0. Value should lie between 0 and 1.

agemin

Minimum age of disease onset or minimum age. Default is 20 years of age.

Details

The ages at onset are generated from the correlated frailties and covariates using the following model:

The correlated shared frailty model with kinship and/or IBD matrices

h(t|X,Z) = h0(t - t0) Z exp( X*vbeta ),

where h0(t) is the baseline hazard function, t0 is a minimum age of disease onset, Z represents a vector of frailties following a multivariate log-normal distribution with mean 0 and variance 2*K*sig1 + D*sig2, where K represents the kinship matrix and D is IBD matrix, sig1 and sig2 are variance components related to each matrix and their values are specified by depend = c(sig1, sig2), and X represents a vector of variables whose names are specified by var_names, and \beta is a vector of corresponding coefficients whose values are specified by vbeta.

The variance structure of the frailties shared within families is chosen by either variation = "kinship" or "IBD" matrix or both variation = c("kinship", "IBD").

When variation = "none", the ages at onset are independently generated from the proportional hazard model conditional on the covariates X.

The design argument defines the type of family based design to be simulated. Two variants of the population-based and clinic-based design can be chosen: "pop" when proband is affected, "pop+" when proband is affected mutation carrier, "cli" when proband is affected and at least one parent and one sibling are affected, "cli+" when proband is affected mutation-carrier and at least one parent and one sibling are affected. The two-stage design, "twostage", is used to oversample high risk families, where the proportion of high risks families to include in the sample is specified by hr. High risk families often include multiple (at least two) affected members in the family. design = "noasc" is to be used for no ascertainment correction.

Value

Returns an object of class 'simfam', a data frame which contains inputdata and the following:

ageonset

Ages at disease onset in years.

time

Ages at disease onset for the affected or ages of last follow-up for the unaffected.

status

Disease statuses: 1 for affected, 0 for unaffected (censored).

fsize

Family size including parents, siblings and children of the proband and the siblings.

naff

Number of affected members in family.

weight

Sampling weights.

References

Choi, Y.-H., Briollais, L., He, W. and Kopciuk, K. (2021) FamEvent: An R Package for Generating and Modeling Time-to-Event Data in Family Designs, Journal of Statistical Software 97 (7), 1-30. doi:10.18637/jss.v097.i07

Choi, Y.-H., Kopciuk, K. and Briollais, L. (2008) Estimating Disease Risk Associated Mutated Genes in Family-Based Designs, Human Heredity 66, 238-251.

Choi, Y.-H. and Briollais (2011) An EM Composite Likelihood Approach for Multistage Sampling of Family Data with Missing Genetic Covariates, Statistica Sinica 21, 231-253.

See Also

summary.simfam2, plot.simfam, penplot

Examples


## Example: simulate family data from a population-based design using
#  a Weibull distribution for the baseline hazard and inducing 
#  residual familial correlation through kinship and IBD matrices.

# Inputdata and IBD matrix should be provided; 
# simuated inputdata as an example here;

data <- simfam(N.fam = 10, design = "noasc", variation = "none",
         base.dist = "Weibull", base.parms = c(0.016, 3), vbeta = c(1, 1))

IBDmatrix <- diag(1, dim(data)[1])
data <- data[ , c(1:7, 11, 14)]

fam2 <- simfam2(inputdata = data, IBD = IBDmatrix, design = "pop", 
        variation = c("kinship","IBD"), depend = c(1, 1), 
        base.dist = "Weibull", base.parms = c(0.016, 3),
        var_names = c("gender", "mgene"), vbeta = c(1,1),
        agemin=20) 

head(fam2)

summary(fam2)


[Package FamEvent version 3.1 Index]