createData {DHARMa} | R Documentation |
Simulate test data
Description
This function creates synthetic dataset with various problems such as overdispersion, zero-inflation, etc.
Usage
createData(sampleSize = 100, intercept = 0, fixedEffects = 1,
quadraticFixedEffects = NULL, numGroups = 10, randomEffectVariance = 1,
overdispersion = 0, family = poisson(), scale = 1, cor = 0,
roundPoissonVariance = NULL, pZeroInflation = 0, binomialTrials = 1,
temporalAutocorrelation = 0, spatialAutocorrelation = 0,
factorResponse = F, replicates = 1, hasNA = F)
Arguments
sampleSize |
sample size of the dataset |
intercept |
intercept (linear scale) |
fixedEffects |
vector of fixed effects (linear scale) |
quadraticFixedEffects |
vector of quadratic fixed effects (linear scale) |
numGroups |
number of groups for the random effect |
randomEffectVariance |
variance of the random effect (intercept) |
overdispersion |
if this is a numeric value, it will be used as the sd of a random normal variate that is added to the linear predictor. Alternatively, a random function can be provided that takes as input the linear predictor. |
family |
family |
scale |
scale if the distribution has a scale (e.g. sd for the Gaussian) |
cor |
correlation between predictors |
roundPoissonVariance |
if set, this creates a uniform noise on the possion response. The aim of this is to create heteroscedasticity |
pZeroInflation |
probability to set any data point to zero |
binomialTrials |
Number of trials for the binomial. Only active if family == binomial |
temporalAutocorrelation |
strength of temporalAutocorrelation |
spatialAutocorrelation |
strength of spatial Autocorrelation |
factorResponse |
should the response be transformed to a factor (inteded to be used for 0/1 data) |
replicates |
number of datasets to create |
hasNA |
should an NA be added to the environmental predictor (for test purposes) |
Examples
testData = createData(sampleSize = 500, intercept = 2, fixedEffects = c(1),
overdispersion = 0, family = poisson(), quadraticFixedEffects = c(-3),
randomEffectVariance = 0)
par(mfrow = c(1,2))
plot(testData$Environment1, testData$observedResponse)
hist(testData$observedResponse)
# with zero-inflation
testData = createData(sampleSize = 500, intercept = 2, fixedEffects = c(1),
overdispersion = 0, family = poisson(), quadraticFixedEffects = c(-3),
randomEffectVariance = 0, pZeroInflation = 0.6)
par(mfrow = c(1,2))
plot(testData$Environment1, testData$observedResponse)
hist(testData$observedResponse)
# binomial with multiple trials
testData = createData(sampleSize = 40, intercept = 2, fixedEffects = c(1),
overdispersion = 0, family = binomial(), quadraticFixedEffects = c(-3),
randomEffectVariance = 0, binomialTrials = 20)
plot(observedResponse1 / observedResponse0 ~ Environment1, data = testData, ylab = "Proportion 1")
# spatial / temporal correlation
testData = createData(sampleSize = 100, family = poisson(), spatialAutocorrelation = 3,
temporalAutocorrelation = 3)
plot(log(observedResponse) ~ time, data = testData)
plot(log(observedResponse) ~ x, data = testData)