gen_structured_model {ggmix} | R Documentation |
Simulation Scenario from Bhatnagar et al. (2018+) ggmix paper
Description
Function that generates data of the different simulation studies
presented in the accompanying paper. This function requires the
popkin
and bnpsd
package to be installed.
Usage
gen_structured_model(
n,
p_design,
p_kinship,
k,
s,
Fst,
b0,
nPC = 10,
eta,
sigma2,
geography = c("ind", "1d", "circ"),
percent_causal,
percent_overlap,
train_tune_test = c(0.6, 0.2, 0.2)
)
Arguments
n |
number of observations to simulate |
p_design |
number of variables in X_test, i.e., the design matrix |
p_kinship |
number of variable in X_kinship, i.e., matrix used to calculate kinship |
k |
number of intermediate subpopulations. |
s |
the desired bias coefficient, which specifies sigma indirectly. Required if sigma is missing |
Fst |
The desired final FST of the admixed individuals. Required if sigma is missing |
b0 |
the true intercept parameter |
nPC |
number of principal components to include in the design matrix used for regression adjustment for population structure via principal components. This matrix is used as the input in a standard lasso regression routine, where there are no random effects. |
eta |
the true eta parameter, which has to be |
sigma2 |
the true sigma2 parameter |
geography |
the type of geography for simulation the kinship matrix.
"ind" is independent populations where every individuals is actually
unadmixed, "1d" is a 1D geography and "circ" is circular geography.
Default: "ind". See the functions in the |
percent_causal |
percentage of |
percent_overlap |
this represents the percentage of causal SNPs that will also be included in the calculation of the kinship matrix |
train_tune_test |
the proportion of sample size used for training tuning parameter selection and testing. default is 60/20/20 split |
Details
The kinship is estimated using the popkin
function from the
popkin
package. This function will multiple that kinship matrix by 2
to give the expected covariance matrix which is subsequently used in the
linear mixed models
Value
A list with the following elements
- ytrain
simulated response vector for training set
- ytune
simulated response vector for tuning parameter selection set
- ytest
simulated response vector for test set
- xtrain
simulated design matrix for training set
- xtune
simulated design matrix for tuning parameter selection set
- xtest
simulated design matrix for testing set
- xtrain_lasso
simulated design matrix for training set for lasso model. This is the same as xtrain, but also includes the nPC principal components
- xtune_lasso
simulated design matrix for tuning parameter selection set for lasso model. This is the same as xtune, but also includes the nPC principal components
- xtest
simulated design matrix for testing set for lasso model. This is the same as xtest, but also includes the nPC principal components
- causal
character vector of the names of the causal SNPs
- beta
the vector of true regression coefficients
- kin_train
2 times the estimated kinship for the training set individuals
- kin_tune_train
The covariance matrix between the tuning set and the training set individuals
- kin_test_train
The covariance matrix between the test set and training set individuals
- Xkinship
the matrix of SNPs used to estimate the kinship matrix
- not_causal
character vector of the non-causal SNPs
- PC
the principal components for population structure adjustment
See Also
Examples
admixed <- gen_structured_model(n = 100,
p_design = 50,
p_kinship = 5e2,
geography = "1d",
percent_causal = 0.10,
percent_overlap = "100",
k = 5, s = 0.5, Fst = 0.1,
b0 = 0, nPC = 10,
eta = 0.1, sigma2 = 1,
train_tune_test = c(0.8, 0.1, 0.1))
names(admixed)