plot_sim_theory {SimMultiCorrData}R Documentation

Plot Simulated Data and Target Distribution Data by Name or Function for Continuous or Count Variables

Description

This plots simulated continuous or count data and overlays data (if overlay = TRUE) generated from the target distribution, which is specified by name (plus up to 4 parameters) or pdf function fx (plus support bounds). Due to the integration involved in evaluating the cdf using fx, only continuous fx may be supplied. Both are plotted as histograms. If a continuous target distribution is specified (cont_var = TRUE), the simulated data y is scaled and then transformed (i.e. y = sigma * scale(y) + mu) so that it has the same mean (mu) and variance (sigma^2) as the target distribution. If the variable is Negative Binomial, the parameters must be size and success probability (not mu). It returns a ggplot2-package object so the user can modify as necessary. The graph parameters (i.e. title, power_color, target_color, target_lty) are ggplot2-package parameters. It works for valid or invalid power method pdfs.

Usage

plot_sim_theory(sim_y, title = "Simulated Data Values", ylower = NULL,
  yupper = NULL, power_color = "dark blue", overlay = TRUE,
  cont_var = TRUE, target_color = "dark green", nbins = 100,
  Dist = c("Benini", "Beta", "Beta-Normal", "Birnbaum-Saunders", "Chisq",
  "Dagum", "Exponential", "Exp-Geometric", "Exp-Logarithmic", "Exp-Poisson",
  "F", "Fisk", "Frechet", "Gamma", "Gaussian", "Gompertz", "Gumbel",
  "Kumaraswamy", "Laplace", "Lindley", "Logistic", "Loggamma", "Lognormal",
  "Lomax", "Makeham", "Maxwell", "Nakagami", "Paralogistic", "Pareto", "Perks",
  "Rayleigh", "Rice", "Singh-Maddala", "Skewnormal", "t", "Topp-Leone",
  "Triangular", "Uniform", "Weibull", "Poisson", "Negative_Binomial"),
  params = NULL, fx = NULL, lower = NULL, upper = NULL, seed = 1234,
  sub = 1000, legend.position = c(0.975, 0.9), legend.justification = c(1,
  1), legend.text.size = 10, title.text.size = 15, axis.text.size = 10,
  axis.title.size = 13)

Arguments

sim_y

a vector of simulated data

title

the title for the graph (default = "Simulated Data Values")

ylower

the lower y value to use in the plot (default = NULL, uses minimum simulated y value)

yupper

the upper y value (default = NULL, uses maximum simulated y value)

power_color

the histogram fill color for the simulated variable (default = "dark blue")

overlay

if TRUE (default), the target distribution is also plotted given either a distribution name (and parameters) or pdf function fx (with support bounds = lower, upper)

cont_var

TRUE (default) for continuous variables, FALSE for count variables

target_color

the histogram fill color for the target distribution (default = "dark green")

nbins

the number of bins to use when creating the histograms (default = 100)

Dist

name of the distribution. The possible values are: "Benini", "Beta", "Beta-Normal", "Birnbaum-Saunders", "Chisq", "Exponential", "Exp-Geometric", "Exp-Logarithmic", "Exp-Poisson", "F", "Fisk", "Frechet", "Gamma", "Gaussian", "Gompertz", "Gumbel", "Kumaraswamy", "Laplace", "Lindley", "Logistic", "Loggamma", "Lognormal", "Lomax", "Makeham", "Maxwell", "Nakagami", "Paralogistic", "Pareto", "Perks", "Rayleigh", "Rice", "Singh-Maddala", "Skewnormal", "t", "Topp-Leone", "Triangular", "Uniform", "Weibull", "Poisson", and "Negative_Binomial". Please refer to the documentation for each package (either stats-package, VGAM-package, or triangle) for information on appropriate parameter inputs.

params

a vector of parameters (up to 4) for the desired distribution (keep NULL if fx supplied instead)

fx

a pdf input as a function of x only, i.e. fx <- function(x) 0.5*(x-1)^2; must return a scalar (keep NULL if Dist supplied instead)

lower

the lower support bound for a supplied fx, else keep NULL (note: if an error is thrown from uniroot, try a slightly higher lower bound; i.e., 0.0001 instead of 0)

upper

the upper support bound for a supplied fx, else keep NULL (note: if an error is thrown from uniroot, try a lower upper bound; i.e., 100000 instead of Inf)

seed

the seed value for random number generation (default = 1234)

sub

the number of subdivisions to use in the integration to calculate the cdf from fx; if no result, try increasing sub (requires longer computation time; default = 1000)

legend.position

the position of the legend

legend.justification

the justification of the legend

legend.text.size

the size of the legend labels

title.text.size

the size of the plot title

axis.text.size

the size of the axes text (tick labels)

axis.title.size

the size of the axes titles

Value

A ggplot2-package object.

References

Please see the references for plot_cdf.

Wickham H. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2009.

See Also

calc_theory, ggplot2-package, geom_histogram

Examples

## Not run: 
# Logistic Distribution: mean = 0, variance = 1
seed = 1234

# Find standardized cumulants
stcum <- calc_theory(Dist = "Logistic", params = c(0, 1))

# Simulate without the sixth cumulant correction
# (invalid power method pdf)
Logvar1 <- nonnormvar1(method = "Polynomial", means = 0, vars = 1,
                       skews = stcum[3], skurts = stcum[4],
                       fifths = stcum[5], sixths = stcum[6],
                       n = 10000, seed = seed)

# Plot simulated variable (invalid) and data from theoretical distribution
plot_sim_theory(sim_y = Logvar1$continuous_variable,
                title = "Invalid Logistic Simulated Data Values",
                overlay = TRUE, Dist = "Logistic", params = c(0, 1),
                seed = seed)

# Simulate with the sixth cumulant correction
# (valid power method pdf)
Logvar2 <- nonnormvar1(method = "Polynomial", means = 0, vars = 1,
                       skews = stcum[3], skurts = stcum[4],
                       fifths = stcum[5], sixths = stcum[6],
                       Six = seq(1.5, 2, 0.05), n = 10000, seed = seed)

# Plot simulated variable (valid) and data from theoretical distribution
plot_sim_theory(sim_y = Logvar2$continuous_variable,
                title = "Valid Logistic Simulated Data Values",
                overlay = TRUE, Dist = "Logistic", params = c(0, 1),
                seed = seed)

# Simulate 2 Negative Binomial distributions and correlation 0.3
# using Method 1
NBvars <- rcorrvar(k_nb = 2, size = c(10, 15), prob = c(0.4, 0.3),
                   rho = matrix(c(1, 0.3, 0.3, 1), 2, 2), seed = seed)

# Plot pdfs of 1st simulated variable and theoretical distribution
plot_sim_theory(sim_y = NBvars$Neg_Bin_variable[, 1], overlay = TRUE,
                cont_var = FALSE, Dist = "Negative_Binomial",
                params = c(10, 0.4))


## End(Not run)


[Package SimMultiCorrData version 0.2.2 Index]