createData {DHARMa}R Documentation

Simulate test data

Description

This function creates synthetic dataset with various problems such as overdispersion, zero-inflation, etc.

Usage

createData(sampleSize = 100, intercept = 0, fixedEffects = 1,
  quadraticFixedEffects = NULL, numGroups = 10, randomEffectVariance = 1,
  overdispersion = 0, family = poisson(), scale = 1, cor = 0,
  roundPoissonVariance = NULL, pZeroInflation = 0, binomialTrials = 1,
  temporalAutocorrelation = 0, spatialAutocorrelation = 0,
  factorResponse = F, replicates = 1, hasNA = F)

Arguments

sampleSize

sample size of the dataset

intercept

intercept (linear scale)

fixedEffects

vector of fixed effects (linear scale)

quadraticFixedEffects

vector of quadratic fixed effects (linear scale)

numGroups

number of groups for the random effect

randomEffectVariance

variance of the random effect (intercept)

overdispersion

if this is a numeric value, it will be used as the sd of a random normal variate that is added to the linear predictor. Alternatively, a random function can be provided that takes as input the linear predictor.

family

family

scale

scale if the distribution has a scale (e.g. sd for the Gaussian)

cor

correlation between predictors

roundPoissonVariance

if set, this creates a uniform noise on the possion response. The aim of this is to create heteroscedasticity

pZeroInflation

probability to set any data point to zero

binomialTrials

Number of trials for the binomial. Only active if family == binomial

temporalAutocorrelation

strength of temporalAutocorrelation

spatialAutocorrelation

strength of spatial Autocorrelation

factorResponse

should the response be transformed to a factor (inteded to be used for 0/1 data)

replicates

number of datasets to create

hasNA

should an NA be added to the environmental predictor (for test purposes)

Examples

testData = createData(sampleSize = 500, intercept = 2, fixedEffects = c(1), 
  overdispersion = 0, family = poisson(), quadraticFixedEffects = c(-3), 
  randomEffectVariance = 0)

par(mfrow = c(1,2))
plot(testData$Environment1, testData$observedResponse)
hist(testData$observedResponse)

# with zero-inflation

testData = createData(sampleSize = 500, intercept = 2, fixedEffects = c(1), 
  overdispersion = 0, family = poisson(), quadraticFixedEffects = c(-3), 
  randomEffectVariance = 0, pZeroInflation = 0.6)

par(mfrow = c(1,2))
plot(testData$Environment1, testData$observedResponse)
hist(testData$observedResponse)

# binomial with multiple trials

testData = createData(sampleSize = 40, intercept = 2, fixedEffects = c(1), 
                      overdispersion = 0, family = binomial(), quadraticFixedEffects = c(-3), 
                      randomEffectVariance = 0, binomialTrials = 20)

plot(observedResponse1 / observedResponse0 ~ Environment1, data = testData, ylab = "Proportion 1")


# spatial / temporal correlation

testData = createData(sampleSize = 100, family = poisson(), spatialAutocorrelation = 3, 
                      temporalAutocorrelation = 3)

plot(log(observedResponse) ~ time, data = testData)
plot(log(observedResponse) ~ x, data = testData)

[Package DHARMa version 0.4.6 Index]