plot_simtheory {SimCorrMix}R Documentation

Plot Simulated Data and Target Distribution Data by Name or Function for Continuous or Count Variables

Description

This plots simulated continuous or count (regular or zero-inflated, Poisson or Negative Binomial) data and overlays data (if overlay = TRUE) generated from the target distribution. The target is specified by name (plus up to 4 parameters) or PDF function fx (plus support bounds). Due to the integration involved in finding the CDF from the PDF supplied by fx, only continuous fx may be supplied. Both are plotted as histograms (using geom_histogram). If a continuous target distribution is specified (cont_var = TRUE), the simulated data y is scaled and then transformed (i.e. y = sigma * scale(y) + mu) so that it has the same mean (mu) and variance (sigma^2) as the target distribution. It works for valid or invalid power method PDF's. It returns a ggplot2-package object so the user can save it or modify it as necessary. The graph parameters (i.e. title, sim_color, target_color, legend.position, legend.justification, legend.text.size, title.text.size, axis.text.size, and axis.title.size) are inputs to the ggplot2-package functions so information about valid inputs can be obtained from that package's documentation.

Usage

plot_simtheory(sim_y, title = "Simulated Data Values", ylower = NULL,
  yupper = NULL, sim_color = "dark blue", overlay = TRUE,
  cont_var = TRUE, target_color = "dark green", binwidth = NULL,
  nbins = 100, Dist = c("Benini", "Beta", "Beta-Normal",
  "Birnbaum-Saunders", "Chisq", "Dagum", "Exponential", "Exp-Geometric",
  "Exp-Logarithmic", "Exp-Poisson", "F", "Fisk", "Frechet", "Gamma", "Gaussian",
  "Gompertz", "Gumbel", "Kumaraswamy", "Laplace", "Lindley", "Logistic",
  "Loggamma", "Lognormal", "Lomax", "Makeham", "Maxwell", "Nakagami",
  "Paralogistic", "Pareto", "Perks", "Rayleigh", "Rice", "Singh-Maddala",
  "Skewnormal", "t", "Topp-Leone", "Triangular", "Uniform", "Weibull",
  "Poisson", "Negative_Binomial"), params = NULL, fx = NULL, lower = NULL,
  upper = NULL, seed = 1234, sub = 1000, legend.position = c(0.975,
  0.9), legend.justification = c(1, 1), legend.text.size = 10,
  title.text.size = 15, axis.text.size = 10, axis.title.size = 13)

Arguments

sim_y

a vector of simulated data

title

the title for the graph (default = "Simulated Data Values")

ylower

the lower y value to use in the plot (default = NULL, uses minimum simulated y value) on the y-axis

yupper

the upper y value (default = NULL, uses maximum simulated y value) on the y-axis

sim_color

the histogram fill color for the simulated variable (default = "dark blue")

overlay

if TRUE (default), the target distribution is also plotted given either a distribution name (and parameters) or PDF function fx (with support bounds = lower, upper)

cont_var

TRUE (default) for continuous variables, FALSE for count variables

target_color

the histogram fill color for the target distribution (default = "dark green")

binwidth

the width of bins to use when creating the histograms (default = NULL)

nbins

the number of bins to use when creating the histograms (default = 100); overridden by binwidth

Dist

name of the distribution. The possible values are: "Benini", "Beta", "Beta-Normal", "Birnbaum-Saunders", "Chisq", "Exponential", "Exp-Geometric", "Exp-Logarithmic", "Exp-Poisson", "F", "Fisk", "Frechet", "Gamma", "Gaussian", "Gompertz", "Gumbel", "Kumaraswamy", "Laplace", "Lindley", "Logistic",
"Loggamma", "Lognormal", "Lomax", "Makeham", "Maxwell", "Nakagami", "Paralogistic", "Pareto", "Perks", "Rayleigh", "Rice", "Singh-Maddala",
"Skewnormal", "t", "Topp-Leone", "Triangular", "Uniform", "Weibull", "Poisson", and "Negative_Binomial". Please refer to the documentation for each package (either stats-package, VGAM-package, or triangle) for information on appropriate parameter inputs.

params

a vector of parameters (up to 4) for the desired distribution (keep NULL if fx supplied instead); for Poisson variables, must be lambda (mean) and the probability of a structural zero (use 0 for regular Poisson variables); for Negative Binomial variables, must be size, mean and the probability of a structural zero (use 0 for regular NB variables)

fx

a PDF input as a function of x only, i.e. fx = function(x) 0.5 * (x - 1)^2; must return a scalar (keep NULL if Dist supplied instead)

lower

the lower support bound for a supplied fx, else keep NULL (note: if an error is thrown from uniroot, try a slightly higher lower bound; i.e., 0.0001 instead of 0)

upper

the upper support bound for a supplied fx, else keep NULL (note: if an error is thrown from uniroot, try a lower upper bound; i.e., 100000 instead of Inf)

seed

the seed value for random number generation (default = 1234)

sub

the number of subdivisions to use in the integration to calculate the CDF from fx; if no result, try increasing sub (requires longer computation time; default = 1000)

legend.position

the position of the legend

legend.justification

the justification of the legend

legend.text.size

the size of the legend labels

title.text.size

the size of the plot title

axis.text.size

the size of the axes text (tick labels)

axis.title.size

the size of the axes titles

Value

A ggplot2-package object.

References

Carnell R (2017). triangle: Provides the Standard Distribution Functions for the Triangle Distribution. R package version 0.11. https://CRAN.R-project.org/package=triangle.

Fialkowski AC (2018). SimMultiCorrData: Simulation of Correlated Data with Multiple Variable Types. R package version 0.2.2. https://CRAN.R-project.org/package=SimMultiCorrData.

Headrick TC, Sheng Y, & Hodis FA (2007). Numerical Computing and Graphics for the Power Method Transformation Using Mathematica. Journal of Statistical Software, 19(3):1-17.
doi: 10.18637/jss.v019.i03.

Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.

Yee TW (2018). VGAM: Vector Generalized Linear and Additive Models. R package version 1.0-5. https://CRAN.R-project.org/package=VGAM.

See Also

calc_theory, ggplot, geom_histogram

Examples

# Using normal mixture variable from contmixvar1 example
Nmix <- contmixvar1(n = 1000, "Polynomial", means = 0, vars = 1,
  mix_pis = c(0.4, 0.6), mix_mus = c(-2, 2), mix_sigmas = c(1, 1),
  mix_skews = c(0, 0), mix_skurts = c(0, 0), mix_fifths = c(0, 0),
  mix_sixths = c(0, 0))
plot_simtheory(Nmix$Y_mix[, 1], title = "Mixture of Normal Distributions",
  fx = function(x) 0.4 * dnorm(x, -2, 1) + 0.6 * dnorm(x, 2, 1),
  lower = -5, upper = 5)
## Not run: 
# Mixture of Beta(6, 3), Beta(4, 1.5), and Beta(10, 20)
Stcum1 <- calc_theory("Beta", c(6, 3))
Stcum2 <- calc_theory("Beta", c(4, 1.5))
Stcum3 <- calc_theory("Beta", c(10, 20))
mix_pis <- c(0.5, 0.2, 0.3)
mix_mus <- c(Stcum1[1], Stcum2[1], Stcum3[1])
mix_sigmas <- c(Stcum1[2], Stcum2[2], Stcum3[2])
mix_skews <- c(Stcum1[3], Stcum2[3], Stcum3[3])
mix_skurts <- c(Stcum1[4], Stcum2[4], Stcum3[4])
mix_fifths <- c(Stcum1[5], Stcum2[5], Stcum3[5])
mix_sixths <- c(Stcum1[6], Stcum2[6], Stcum3[6])
mix_Six <- list(seq(0.01, 10, 0.01), c(0.01, 0.02, 0.03),
  seq(0.01, 10, 0.01))
Bstcum <- calc_mixmoments(mix_pis, mix_mus, mix_sigmas, mix_skews,
  mix_skurts, mix_fifths, mix_sixths)
Bmix <- contmixvar1(n = 10000, "Polynomial", Bstcum[1], Bstcum[2]^2,
  mix_pis, mix_mus, mix_sigmas, mix_skews, mix_skurts, mix_fifths,
  mix_sixths, mix_Six)
plot_simtheory(Bmix$Y_mix[, 1], title = "Mixture of Beta Distributions",
  fx = function(x) mix_pis[1] * dbeta(x, 6, 3) + mix_pis[2] *
    dbeta(x, 4, 1.5) + mix_pis[3] * dbeta(x, 10, 20), lower = 0, upper = 1)

## End(Not run)


[Package SimCorrMix version 0.1.1 Index]