R: Simulation Scenario from Bhatnagar et al. (2018+) ggmix paper

gen_structured_model {ggmix}

R Documentation

Simulation Scenario from Bhatnagar et al. (2018+) ggmix paper

Description

Function that generates data of the different simulation studies presented in the accompanying paper. This function requires the popkin and bnpsd package to be installed.

Usage

gen_structured_model(
  n,
  p_design,
  p_kinship,
  k,
  s,
  Fst,
  b0,
  nPC = 10,
  eta,
  sigma2,
  geography = c("ind", "1d", "circ"),
  percent_causal,
  percent_overlap,
  train_tune_test = c(0.6, 0.2, 0.2)
)

Arguments

`n`	number of observations to simulate
`p_design`	number of variables in X_test, i.e., the design matrix
`p_kinship`	number of variable in X_kinship, i.e., matrix used to calculate kinship
`k`	number of intermediate subpopulations.
`s`	the desired bias coefficient, which specifies sigma indirectly. Required if sigma is missing
`Fst`	The desired final FST of the admixed individuals. Required if sigma is missing
`b0`	the true intercept parameter
`nPC`	number of principal components to include in the design matrix used for regression adjustment for population structure via principal components. This matrix is used as the input in a standard lasso regression routine, where there are no random effects.
`eta`	the true eta parameter, which has to be `0 < eta < 1`
`sigma2`	the true sigma2 parameter
`geography`	the type of geography for simulation the kinship matrix. "ind" is independent populations where every individuals is actually unadmixed, "1d" is a 1D geography and "circ" is circular geography. Default: "ind". See the functions in the `bnpsd` for details on how this data is actually generated.
`percent_causal`	percentage of `p_design` that is causal. must be `0 \leq percent_causal \leq 1`. The true regression coefficients are generated from a standard normal distribution.
`percent_overlap`	this represents the percentage of causal SNPs that will also be included in the calculation of the kinship matrix
`train_tune_test`	the proportion of sample size used for training tuning parameter selection and testing. default is 60/20/20 split

Details

The kinship is estimated using the popkin function from the popkin package. This function will multiple that kinship matrix by 2 to give the expected covariance matrix which is subsequently used in the linear mixed models

Value

A list with the following elements

ytrain: simulated response vector for training set
ytune: simulated response vector for tuning parameter selection set
ytest: simulated response vector for test set
xtrain: simulated design matrix for training set
xtune: simulated design matrix for tuning parameter selection set
xtest: simulated design matrix for testing set
xtrain_lasso: simulated design matrix for training set for lasso model. This is the same as xtrain, but also includes the nPC principal components
xtune_lasso: simulated design matrix for tuning parameter selection set for lasso model. This is the same as xtune, but also includes the nPC principal components
xtest: simulated design matrix for testing set for lasso model. This is the same as xtest, but also includes the nPC principal components
causal: character vector of the names of the causal SNPs
beta: the vector of true regression coefficients
kin_train: 2 times the estimated kinship for the training set individuals
kin_tune_train: The covariance matrix between the tuning set and the training set individuals
kin_test_train: The covariance matrix between the test set and training set individuals
Xkinship: the matrix of SNPs used to estimate the kinship matrix
not_causal: character vector of the non-causal SNPs
PC: the principal components for population structure adjustment

Examples

admixed <- gen_structured_model(n = 100,
                                p_design = 50,
                                p_kinship = 5e2,
                                geography = "1d",
                                percent_causal = 0.10,
                                percent_overlap = "100",
                                k = 5, s = 0.5, Fst = 0.1,
                                b0 = 0, nPC = 10,
                                eta = 0.1, sigma2 = 1,
                                train_tune_test = c(0.8, 0.1, 0.1))
names(admixed)

[Package ggmix version 0.0.2 Index]