R: Simulate joint mean and dispersion effects models fits

jmdem.sim {jmdem}

R Documentation

Simulate joint mean and dispersion effects models fits

Description

Simulate iterative jmdem fits on user-defined model settings

Usage

jmdem.sim(mformula = "y ~ 1 + x", dformula = "~ 1 + z", data = NULL, 
          beta.true, lambda.true, mfamily = gaussian, 
          dfamily = Gamma, dev.type = c("deviance", "pearson"), 
          x.str = list(type = "numeric", random.func = "runif", param = list()), 
          z.str = list(type = "numeric", random.func = "runif", param = list()), 
          n = NULL, simnum = NULL, trace = FALSE, asymp.test = FALSE, 
          weights = NULL, moffset = NULL, doffset = NULL, 
          mustart = NULL, phistart = NULL, betastart = NULL, 
          lambdastart = NULL, hessian = TRUE, na.action, 
          grad.func = TRUE, fit.method = "jmdem.fit", 
          method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent"), 
          df.adj = FALSE, disp.adj = FALSE, full.loglik = FALSE, 
          mcontrasts = NULL, dcontrasts = NULL, beta.first = TRUE, 
          prefit = TRUE, control = list(...), 
          minv.method = c("solve", "chol2inv", "ginv"), ...)
          
simdata.jmdem.sim(mformula = "y ~ 1 + x", dformula = "~ 1 + z", beta.true, lambda.true, 
                   x.str = list(type = "numeric", random.func = "runif", param = list()), 
                   z.str = list(type = "numeric", random.func = "runif", param = list()), 
                   mfamily = gaussian, dfamily = Gamma, weights = NULL, n, simnum = 1, 
                   moffset = NULL, doffset = NULL)

getdata.jmdem.sim(object)

Arguments

`mformula`	the user-defined true mean submodel, expressed in form of an object of class "`formula`". The number of regressors and their interactions can be specified here, but not their true parameter values.
`dformula`	the user-defined true dispersion submodel. See `mformula`.
`data`	an optional data frame or list of several data frames. If no data are provided, `jmdem.sim` will generate its own data for simulation by `simdata.jmdem.sim`.
`beta.true`	a vector of the true parameter values of the mean submodel. The number of elements in `beta.true` must be identical with the number of parameters to be estimated in `mformula`, including the intercept if there exists one in the model.
`lambda.true`	a vector of the true parameter values of the dispersion submodel. The number of elements in `lambda.true` must be identical with the number of parameters to be estimated in `dformula`, including the intercept if there exists one in the model.
`mfamily`	a description of the error distribution and link function to be used in the mean submodel. This can be a character string naming a family function, a family function or the result of a call to a family function. (See `family` for details of family functions.)
`dfamily`	a description of the error distribution and link function to be used in the dispersion submodel. (Also see `family` for details of family functions.)
`dev.type`	a specification of the type of residuals to be used as the response of the dispersion submodel. The ML estimates of the jmdem are the optima of either the quasi-likelihood function for deviance residuals, or the pseudo-likelihood function for Pearson residuals.
`x.str`	a list of user-specified structure for the generation of the mean submodel design matrix, including the `type` (numeric, character, logical etc.), an `r` function (`random.func`) to generate the values of the regressors and the corresponding parameters (`param`) to be passed on to (`random.func`). Note that all parameters that belong to the same `random.func` must be put in a `list(...)`. See details.
`z.str`	a list of user-specified structure for the generation of the dispersion submodel design matrix, including the `type` (numeric, character, logical etc.), an `r` function (`random.func`) to generate the values of the regressors and the corresponding parameters (`param`) to be passed on to (`random.func`). Note that all parameters that belong to the same `random.func` must be put in a `list(...)`. See details.
`n`	a numeric value specifying the sample size in each simulation.
`simnum`	a numeric value specifying the number of simulations.
`trace`	a specification whether the estimated coefficients should be printed to screen after each simulation.
`asymp.test`	a specification whether the Rao's score and Wald tests should be conducted for each simulation.
`...`	for `control`: arguments to be used to form the default control argument if it is not supplied directly. For `jmdem.sim`: further arguments passed to or from other methods.

The following arguments are used for JMDEM fitting. See jmdem for details.

`weights`	an optional vector of 'prior weights' to be used in the fitting process. Should be `NULL` or -a numeric vector.
`moffset`	an a priori known component to be included in the linear predictor of the mean submodel during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See `model.offset`.
`doffset`	an a priori known component to be included in the linear predictor of the dispersion submodel during fitting. See `model.offset`.
`mustart`	a vector of starting values of individual means.
`phistart`	a vector of starting values of individual dispersion.
`betastart`	a vector of starting values for the regression parameters of the mean submodel.
`lambdastart`	a vector of starting values for the regression parameters of the dispersion submodel.
`hessian`	the method used to compute the information matrix. Hessian matrix will be calculated for `"TRUE"`, Fisher matrix for `"FALSE"`.
`na.action`	a function which indicates what should happen when the data contain `NA`s. The default is set by the `na.action` setting of `options`, and is `na.fail` if that is unset. The 'factory-fresh' default is `na.omit`. Another possible value is `NULL`, no action. Value `na.exclude` can be useful.
`grad.func`	the gradient function will be included in the optimisation for the "`BFGS`", "`CG`" and "`L-BFGS-B`" methods for `"TRUE"`. If it is `NULL`, a finite-difference approximation will be used. For the "`SANN`" method it specifies a function to generate a new candidate point. If it is `NULL` a default Gaussian Markov kernel is used.
`fit.method`	the method to be used in fitting the model. The default method "`jmdem.fit`" uses the general-purpose optimisation (`optim`): the alternative "model.frame" returns the model frame and does no fitting. User-supplied fitting functions can be supplied either as a function or a character string naming a function, with a function which takes the same arguments as `jmdem.fit`. If specified as a character string it is looked up from within the stats namespace.
`method`	the method to be used for the optimisation. See `optim` for details.
`df.adj`	an adjustment factor for the degrees of freedom `(n-p)/n`, where `n` is the number of observations and `p` is the number of parameters to be estimated in `jmdem`, will be multiplied to the likelihood function before the optimisation for `"TRUE"`.
`disp.adj`	an adjustment factor for the dispersion weight will be multiplied to the estimated dispersion parameter during the optimisation for `"TRUE"`. For details, please see McCullagh and Nelder (1989, Ch. 10, P. 362).
`full.loglik`	the full likelihood function instead of the quasi- or pseudo-likelihood function will be used for the optimisation for `TRUE`.
`mcontrasts`	an optional list for the mean effect constrasts. See the `contrasts.arg` of `model.matrix.default`.
`dcontrasts`	an optional list for the dispersion effect constrasts. See the `contrasts.arg` of `model.matrix.default`.
`beta.first`	the mean effects will be estimated (assuming constant sample dispersion) at the initial stage for `TRUE`. For `FALSE`, the dispersion effects will be estimated first (assuming constantly zero mean for the whole sample).
`prefit`	a specfication whether `jmdem` uses `glm` to prefit the starting values of the mean and dispersion parameters. For `FALSE`, the initial parameter values of all the regressors are set to zero and the sample mean and sample dispersion will be used as the starting values of the corresponding submodel intercepts instead. If the submodels have no intercept, all parameters will also be set to zero. The sample mean and sample dispersion will then be used as `mustart` and `phistart` in the internal computation (they will not be officially recorded in `mustart` and `phistart` in the output object). Defaule value is `TRUE`.
`control`	a list of parameters for controlling the fitting process. For `jmdem.fit` this is passed to `jmdem.control`.
`minv.method`	the method used to invert matrices during the estimation process. "`solve`" gives the solutions of a system of equations, "`chol2inv`" gives the inverse from Choleski or QR decomposition and "`ginv`" gives the generalised inverse of a matrix. If none of the methods is specified or if they are specified in a vector such as `c("solve", "chol2inv", "ginv")`, the matrices will be inverted by the methods in the sequence as given in the vector until it is found.
`object`	one or several objects of class `jmdem.sim`, typically the result of a call to `jmdem.sim`.

Details

jmdem.sim simulates the fitting of datasets in which the regressors of the mean and dispersion submodels are generated according to the specification given in x.str and z.str. The response variable will be then generated according to the distribution specified in mfamily with linear predictor of the mean given by mformula and the linear predictor of the dispersion given by dformula.

The specifications in x.str and z.str are rather flexible if more than one independent variables are included in any of the submodels. For instance, if one of the two independent variables of the mean submodel is numeric generated from the normal distribution of mean 0 and standard deviation 1, and the other one is a 4-level factor {0, 1, 2, 3} generated from the uniform distribution, then they can be specified in a vector using c(...), such as: x.str = list(type = c("numeric", "factor"), random.func = c("rnorm", "runif"), param = c(list(mean = 0, sd = 1), list(min = 0, max = 3))).

Note that the higher the number of simulations specified in simnum, the more stabilised are the aggregated simulation results. The larger the sample size in each simulation, the less fluctuated are the estimated results among the simulations.

Users gain simdata.jmdem.sim higher control on the simulation by generating a number of datasets upon their own settings first, and not running jmdem.sim at the same time. By taking these steps, users also have the flexiblility to edit the datasets according their own individual requirements, before calling them in jmdem.sim.

Users can also extract the datasets used in jmdem.sim by getdata.jmdem.sim. This function is useful if the datasets are generated in jmdem.sim where users do not have access prior to the simulations.

getdata.jmdem.sim and simdata.jmdem.sim can also be useful if the users would like to conduct various simulations with different jmdem settings on the same data.

Value

An object of class jmdem.sim contains of a list of jmdem fits with full model information. That means, each element of the jmdem.sim object contains the same attributes as a jmdem object. See values of jmdem for details.

Author(s)

Karl Wu Ka Yui (karlwuky@suss.edu.sg)

Examples

## Run 10 JMDEM simulations with samples of size 50. The response
## variable is Gaussian with mean beta_0 + beta_1 * x and variance 
## log(sigma^2) = lambda_0 + lambda_1 * z. The observations of 
## the predictor x should be random numbers generated from the normal 
## distribution with mean 0 and standard deviation 2. The observations
## of z are factors with three levels between 0 and 2, generated from 
## the uniform distribution. The true values of the mean submodel's 
## intercept and slope are 1.5 and 4, as well as 2.5, 3 and -0.2 for 
## the dispersion submodel's intercept and slope.
sim <- jmdem.sim(mformula = y ~ x, dformula = ~ z, beta.first = TRUE, 
                 mfamily = gaussian, dfamily = Gamma(link = "log"), 
                 x.str = list(type = "numeric", random.func = "rnorm", 
                              param = list(mean = 0, sd = 2)),
                 z.str = list(type = "factor", random.func = "runif", 
                              param = list(min = 0, max = 2)),
                 beta.true = c(1.5, 4), lambda.true = c(2.5, 3, -0.2), 
                 grad.func = TRUE, method = "BFGS", n = 50,
                 simnum = 10)

[Package jmdem version 1.0.1 Index]