simulateSNPs {scrime} | R Documentation |
Simulation of SNP data
Description
Simulates SNP data, where a specified proportion of cases and controls
is explained by specified set of SNP interactions. Can also be used
to simulate a data set with a multi-categorical response, i.e.\
a data set in which the cases are divided into several classes (e.g.,
different diseases or subtypes of a disease).
Usage
simulateSNPs(n.obs, n.snp, vec.ia, prop.explain = 1,
list.ia.val = NULL, vec.ia.num = NULL, vec.cat = NULL,
maf = c(0.1, 0.4), prob.val = rep(1/3, 3), list.equal = NULL,
prob.equal = 0.8, rm.redundancy = TRUE, shuffle = FALSE,
shuffle.obs = FALSE, rand = NA)
Arguments
n.obs |
either an integer specifying the total number of
observations, or a vector of length 2 specifying the number
of cases and the number of controls. If vec.cat is specified,
then the partitioning of the number of cases to the different
classes can be governed by vec.ia.num . If n.obs is
an integer, then 1 / c of the observations will be controls and
the remaining observations will be cases, where c is the total number
of groups (including the controls).
|
n.snp |
integer specifying the number of SNPs.
|
vec.ia |
a vector of integers specifying the orders of the interactions
that explain the cases. c(3,1,2,3) , e.g., means that a three-way,
a one-way (i.e. just a SNP), a two-way, and a three-way interaction explain the cases.
|
prop.explain |
either an integer or a vector of length(vec.ia)
specifying the proportions of cases explained by the interactions of interest
among all observation having the interaction of interest. Must be larger than 0.5.
E.g., prop.explain = 1 means that only cases have the interactions of interest
specified by vec.ia (and list.ia.val ). E.g., vec.ia = c(3, 2)
and prop.explain = c(1, 0.8) means that only cases have the three-way interaction of
interest, while 80% of the observations having the two-way interaction of interest
are cases, and 20% are controls.
|
list.ia.val |
a list of length(vec.ia) specifying the exact interactions.
The objects in this list must be vectors of length vec.ia[i] , and consist
of the values 0 (for homozygous reference), 1 (heterozygous variant), or 2 (homozygous variant).
E.g., vec.ia = c(3, 2) and list.ia.val = list(c(2, 0, 1), c(0, 2))
and prob.equal = 1 (see also list.equal ) means that
((SNP1 == 2) \& (SNP2 == 0) \& (SNP3 == 1)) and ((SNP4 == 0) \& (SNP5 == 2))
are the explanatory interactions (if additionally prob.equal = 1 ; see also
list.equal ). If NULL , the genotypes are randomly drawn
using the probabilities given by prob.val .
|
vec.ia.num |
a vector of length(vec.ia) specifying the number of
cases (not observations) explained by the interactions in vec.ia .
If NULL , all the cases are divided into length(vec.ia) groups of
about the same size. sum(vec.ia.num) must be smaller than
or equal to the total number of cases. Each entry of vec.ia.num must currently be >= 10.
|
vec.cat |
a vector of the same length of vec.ia specifying the subclasses
of the cases that are explained by the corresponding interaction in vec.ia . If NULL ,
no subclasses will be considered. This feature is currently not fully tested. So be careful
if specifying vec.cat .
|
maf |
either an integer, or a vector of length 2 or n.snp specifying
the minor allele frequencies. If an integer, all SNPs will have the same
minor allele frequency. If a vector of length n.snp , each SNP will have the minor
allele frequency specified in the corresponding entry of maf . If length 2, then
maf is interpreted as the range of the minor allele frequencies, and for each SNP,
a minor allele frequency will be randomly drawn from a uniform distribution with
the range given by maf . Note: If a SNP belongs to an explanatory interaction,
then only the set of observations not explained by this interaction will have
the minor allele frequency specified by maf .
|
prob.val |
a vector consisting of the probabilities for drawing a 0, 1, or 2,
if list.ia.val = NULL , i.e.\ if the genotypes of the SNPs explaining the case-control
status should be randomly drawn. Ignored if list.ia.val is specified. By default,
each genotype has the same probability of being drawn.
|
list.equal |
list of same structure as list.ia.val containing only ones and
zeros, where a 1 specifies the equality to the corresponding value in
list.ia.val , and a 0 specifies the non-equality. Thus, the entries of list.equal
specify if the corresponding SNP should be of a particular genotype (when the entry is 1)
or should be not of this genotype (when entry is 0). If NULL , this list
will be generated automatically using prob.equal . If, e.g., vec.ia = c(3, 2) ,
list.ia.val = list(c(2, 0, 1), c(0, 2)) ,
and list.equal = list(c(1, -1, 1), c(1, -1)) , then the explanatory interactions are
given by ((SNP1 == 2) \& (SNP2 != 0) \& (SNP3 == 1)) and ((SNP4 == 0) \& (SNP5 != 2))
|
prob.equal |
a numeric value specifying the probability that a 1 is drawn when generating
list.equal . prob.equal is thus the probability for an equal sign.
|
rm.redundancy |
should redundant SNPs be removed from the explaining interactions?
It is possible that one specify an explaining i -way interaction, but an interaction
between (i-1) of the variables contained in the i -way
interaction already explains all the cases (and controls) that the i -way interaction
should explain. In this case, the redundant SNP is removed if rm.redundancy = TRUE .
|
shuffle |
logical. By default, the first sum(vec.ia) columns of the generated
data set contain the explanatory SNPs in the same order as they appear in this data set.
If TRUE , this order will be shuffled.
|
shuffle.obs |
should the observations be shuffled?
|
rand |
integer. Sets the random number generator in a reproducible state.
|
Value
An object of class simulatedSNPs
composed of
data |
a matrix with n.obs rows and n.snp columns containing
the SNP data.
|
cl |
a vector of length n.obs comprising the case-control status of the
observations.
|
tab.explain |
a table naming the explanatory interactions and the numbers
of cases and controls explained by them.
|
ia |
character vector naming the interactions.
|
maf |
vector of length n.snp containing the minor allele frequencies.
|
Note
Currently, the genotypes of all SNPs are simulated independently from each other
(except for the SNPs that belong to the same explanatory interaction).
Author(s)
Holger Schwender holger.schwender@udo.edu
See Also
simulateSNPglm
, simulateSNPcatResponse
Examples
## Not run:
# Simulate a data set containing 2000 observations (1000 cases
# and 1000 controls) and 50 SNPs, where one three-way and two
# two-way interactions are chosen randomly to be explanatory
# for the case-control status.
sim1 <- simulateSNPs(2000, 50, c(3, 2, 2))
sim1
# Simulate data of 1200 cases and 800 controls for 50 SNPs,
# where 90% of the observations showing a randomly chosen
# three-way interaction are cases, and 95% of the observations
# showing a randomly chosen two-way interactions are cases.
sim2 <- simulateSNPs(c(1200, 800), 50, c(3, 2),
prop.explain = c(0.9, 0.95))
sim2
# Simulate a data set consisting of 1000 observations and 50 SNPs,
# where the minor allele frequency of each SNP is 0.25, and
# the interactions
# ((SNP1 == 2) & (SNP2 != 0) & (SNP3 == 1)) and
# ((SNP4 == 0) & (SNP5 != 2))
# are explanatory for 200 and 250 of the 500 cases, respectively,
# and for none of the 500 controls.
list1 <- list(c(2, 0, 1), c(0, 2))
list2 <- list(c(1, 0, 1), c(1, 0))
sim3 <- simulateSNPs(1000, 50, c(3, 2), list.ia.val = list1,
list.equal = list2, vec.ia.num = c(200, 250), maf = 0.25)
## End(Not run)
[Package
scrime version 1.3.5
Index]