DGP {AllelicSeries} | R Documentation |
Data Generating Process
Description
Generate a data set consisting of:
"anno"A SNP-length annotation vector.
"covar"A subject by 6 covariate matrix.
"geno"A subject by SNP genotype matrix.
"pheno"A subject-length phenotype vector.
Usage
DGP(
anno = NULL,
beta = c(0, 1, 2),
binary = FALSE,
geno = NULL,
include_residual = TRUE,
indicator = FALSE,
maf_range = c(0.005, 0.01),
method = "none",
n = 100,
p_dmv = 0.4,
p_ptv = 0.1,
prop_causal = 1,
random_signs = FALSE,
random_var = 0,
snps = 100,
weights = c(1, 2, 3)
)
Arguments
anno |
Annotation vector, if providing genotypes. Should match the number of columns in geno. |
beta |
If method = "none", a (3 x 1) coefficient vector for bmvs, dmvs, and ptvs respectively. If method != "none", a scalar effect size. |
binary |
Generate binary phenotype? Default: FALSE. |
geno |
Genotype matrix, if providing genotypes. |
include_residual |
Include residual? If FALSE, returns the expected value. Intended for testing. |
indicator |
Convert raw counts to indicators? Default: FALSE. |
maf_range |
Range of minor allele frequencies: c(MIN, MAX). |
method |
Genotype aggregation method. Default: "none". |
n |
Sample size. |
p_dmv |
Frequency of deleterious missense variants. Default of 40% is based on the frequency of DMVs among rare coding variants in the UK Biobank. |
p_ptv |
Frequency of protein truncating variants. Default of 10% is based on the frequency of PTVs among rare coding variants in the UK Biobank. |
prop_causal |
Proportion of variants which are causal. Default: 1.0. |
random_signs |
Randomize signs? FALSE for burden-type genetic architecture, TRUE for SKAT-type. |
random_var |
Frailty variance in the case of random signs. Default: 0. |
snps |
Number of SNP in the gene. Default: 100. |
weights |
Aggregation weights. |
Value
List containing: genotypes, annotations, covariates, phenotypes.
Examples
# Generate data.
data <- DGP(n = 100)
# View components.
table(data$anno)
head(data$covar)
head(data$geno[, 1:5])
hist(data$pheno)