simulateSNPcatResponse {scrime} | R Documentation |
Simulation of SNP Data with Categorical Response
Description
Simulates SNP data. Interactions of some of the simulated SNPs are then used to specify a categorical response by level-wise or multinomial logistic regression.
Usage
simulateSNPcatResponse(n.obs = 1000, n.snp = 50, list.ia = NULL,
list.snp = NULL, withRef = FALSE, beta0 = -0.5, beta = 1.5,
maf = 0.25, sample.y = TRUE, rand = NA)
## S3 method for class 'simSNPcatResponse'
print(x, justify = c("left", "right"), spaces = 2, ...)
Arguments
n.obs |
number of observations that should be generated. |
n.snp |
number of SNPs that should be generated. |
list.ia |
a list consisting of If, e.g., one of the
vectors is given by
For more details, see Details. Must be specified if |
list.snp |
a list consisting of numeric vectors (if one interaction should be explanatory
for a level of the response) or lists of numeric vectors (if there should be more than one
explanatory interaction) specifying the SNPs that compose
the interactions. |
withRef |
should there be an additional reference group (i.e.\ a control group) denoted by a
zero? If |
beta0 |
a numeric value or vector of |
beta |
either a non-negative numeric value or a list of non-negative numeric values specifying
the parameters in the logistic regression models. If a numeric value, all parameters (except for
the intercept) in all logistic regression models will be equal to this value. If a list, then
this list must have the same length as |
maf |
either an integer, or a vector of length 2 or |
sample.y |
should the values of the response be randomly drawn using the probabilities
determined by the logistic regression models? If |
rand |
a numeric value for setting the random number generator in a reproducible state. |
x |
the output of |
justify |
a character string specifying whether the column of the summarizing table that
names the explanatory interactions should be |
spaces |
integer specifying the distance from the left end of the column mentioned in |
... |
ignored. |
Details
simulateSNPcatResponse
first simulates a matrix consisting of n.obs
observations and n.snp
SNPs, where the minor allele frequencies of these SNPs are given by maf
.
Note that all SNPs are currently simulated independently of each other such that they are unlinked. Moreover, an observation is currently not allowed to have genotypes/interactions that are explanatory for more than one of the levels of the response. If, e.g., the response has three categories, then an observation can either exhibit one (or more) of the genotypes explaining the first level, or one (or more) of the genotypes explanatory for the second level, or one (or more) of the genotypes explaining the third level, or none of these genotypes.
Afterwards, the response is generated by employing the specifications of list.ia
,
list.snp
, beta0
and beta
.
By default, i.e.\ if both list.ia
and list.snp
are NULL
, list.ia
is set
to
list(c(-1, 1), c(1, 1, 1), list(c(-1, 1), c(1, 1, 1)))
,
and list.snp
is set to
list(c(6, 7), c(3, 9, 10), list(c(2, 5), c(1, 4, 8)))
such that the interaction
(SNP6 != 1) & (SNP7 == 1)
is assumed to be explanatory for the first level of the three-categorical response, the interaction
(SNP3 == 1) & (SNP9 == 1) & (SNP10 == 1)
is assumed to be explanatory for the second level, and the interactions
(SNP2 != 1) & (SNP5 == 1)
\ \ \ and
(SNP1 == 1) & (SNP4 == 1) & (SNP8 == 1)
,
are assumed to be explanatory for the third level.
If withRef = FALSE
, then for each of the levels,
the probability of having this level given that an observation exhibits one, two, ...
of the interactions intended to be explanatory for that level is determined using the corresponding
logistic regression model. Afterwards, the value of the response for an observation showing one, two, ...
of the interactions explanatory for a particular level is randomly drawn using the above probability p
for the particular level and (1-p)/(n_{cat}-1)
as probabilities for the other
(n_{cat}-1)
levels. If an observation exhibits none of the explanatory interactions,
its response value is randomly drawn using the probabilities \exp{beta0}/(1+\exp{beta0})
.
If withRef = TRUE
, a multinomial logistic regression is used to specify the class labels. In this
case the probabilities p_j
, j = 1, ..., n.cat
, are given by
p_j = \exp(q_j) * p_0
, where q_j
are the probabilities on the
logit-scale (i.e.\ the probabilities on the scale of the linear predictors) and
p_0^{-1} = 1 + p_1 + ... + p_{n.cat}
is the reciprocal
of the probability for the control/reference group.
Value
An object of class simSNPcatResponse
consisting of
x |
a matrix with |
y |
a vector of length |
models |
a character vector naming the level-wise logistic regression models. |
maf |
a vector of length |
tab.explain |
a data frame summarizing the results of the simulation. |
Author(s)
Holger Schwender, holger.schwender@udo.edu
See Also
Examples
## Not run:
# The simulated data set described in Details.
sim1 <- simulateSNPcatResponse()
sim1
# Specifying the values of the response by the levels with
# the largest probability.
sim2 <- simulateSNPcatResponse(sample.y = FALSE)
sim2
# If ((SNP4 != 2) & (SNP3 == 1)), (SNP5 ==3), and
# ((SNP12 !=1) & (SNP9 == 3)) should be the three interactions
# (or variables) that are explanatory for the three levels
# of the response, list.ia and list.snp are specified as follows.
list.ia <- list(c(-2, 1), 3, c(-1,3))
list.snp <- list(c(4, 3), 5, c(12,9))
# The categorical response and a data set consisting of
# 800 observations and 25 SNPs, where the minor allele
# frequency of each SNP is randomly drawn from a
# uniform distribution with minimum 0.1 and maximum 0.4,
# is then generated by
sim3 <- simulateSNPcatResponse(n.obs = 800, n.snp = 25,
list.ia = list.ia, list.snp = list.snp, maf = c(0.1, 0.4))
sim3
## End(Not run)