data_sim {gratia}R Documentation

Simulate example data for fitting GAMs

Description

A tidy reimplementation of the functions implemented in mgcv::gamSim() that can be used to fit GAMs. An new feature is that the sampling distribution can be applied to all the example types.

Usage

data_sim(
  model = "eg1",
  n = 400,
  scale = NULL,
  theta = 3,
  power = 1.5,
  dist = c("normal", "poisson", "binary", "negbin", "tweedie", "gamma", "ocat",
    "ordered categorical"),
  n_cat = 4,
  cuts = c(-1, 0, 5),
  seed = NULL,
  gfam_families = c("binary", "tweedie", "normal")
)

Arguments

model

character; either "egX" where X is an integer 1:7, or the name of a model. See Details for possible options.

n

numeric; the number of observations to simulate.

scale

numeric; the level of noise to use.

theta

numeric; the dispersion parameter \theta to use. The default is entirely arbitrary, chosen only to provide simulated data that exhibits extra dispersion beyond that assumed by under a Poisson.

power

numeric; the Tweedie power parameter.

dist

character; a sampling distribution for the response variable. "ordered categorical" is a synonym of "ocat".

n_cat

integer; the number of categories for categorical response. Currently only used for distr %in% c("ocat", "ordered categorical").

cuts

numeric; vector of cut points on the latent variable, excluding the end points -Inf and Inf. Must be one fewer than the number of categories: length(cuts) == n_cat - 1.

seed

numeric; the seed for the random number generator. Passed to base::set.seed().

gfam_families

character; a vector of distributions to use in generating data with grouped families for use with family = gfam(). The allowed distributions as as per dist.

Details

data_sim() can simulate data from several underlying models of known true functions. The available options currently are:

The random component providing noise or sampling variation can follow one of the distributions, specified via argument dist

Other arguments provide the parameters for the distribution.

References

Gu, C., Wahba, G., (1993). Smoothing Spline ANOVA with Component-Wise Bayesian "Confidence Intervals." J. Comput. Graph. Stat. 2, 97–117.

Luo, Z., Wahba, G., (1997). Hybrid adaptive splines. J. Am. Stat. Assoc. 92, 107–116.

Examples


data_sim("eg1", n = 100, seed = 1)

# an ordered categorical response
data_sim("eg1", n = 100, dist = "ocat", n_cat = 4, cuts = c(-1, 0, 5))


[Package gratia version 0.9.2 Index]