MixtureModel {mpower}R Documentation

Correlated predictors generator

Description

This function creates a generative model for the correlated, mixed-scale predictors.

Usage

MixtureModel(
  method = "estimation",
  data = NULL,
  G = NULL,
  m = 100,
  nudge = 1e-09,
  sbg_args = list(nsamp = 1000),
  cvine_marginals = list(),
  cvine_dtypes = list(),
  resamp_prob = NULL
)

Arguments

method

A string, one of the three options "resampling", "estimation", or "cvine". Default is "estimation". See Details.

data

A dataframe or matrix, required for resampling" and "estimation" method.

G

A guesstimate pairwise correlation matrix for "cvine" method. See Details.

m

A positive number indicating uncertainty in the guesstimate G, larger means more uncertainty. Default is 100.

nudge

A number, default 10e-10 to add to the diagonal of the covariance matrix for numerical stability.

sbg_args

A list of named arguments, except Y, for function 'sbgcop.mcmc()'.

cvine_marginals

A named list describing the univariate distribution of each predictor. See Details.

cvine_dtypes

A named list describing the data type of each variable.

resamp_prob

A vector of sampling probability for each observation in data. Must sums to 1.

Value

A MixtureModel object.

Details

There are three methods to generate data:

1. Resampling: if we have enough data of the predictors, we can resample to get realistic joint distributions and dependence among them.

2. Estimation: if we have a small sample from, for example, a pilot study, we can sample from a semi-parametric copula model (Hoff 2007) after learning the dependence and univariate marginals of the predictors.

3. C-vine: if no pilot data exists, we can still set rough guesstimate of the dependence and univariate marginals. The C-vine algorithm (Joe 2006) generates positive semi-definite correlation matrix given the guesstimate G. The guesstimate G is a symmetric p x p matrix whose ij-th item is between -1 and 1 and is the guesstimate correlation between predictor ith and jth. G doesn't need to be a valid correlation matrix. The method works well when values in G are not extreme (i.e., 0.999, -0.999). Built-in functions for univariate marginals include: 'qbeta' , 'qbinom', 'qcauchy', 'qchisq', 'qexp', 'qf', 'qgamma', 'qgeom', 'qhyper', 'qlogis', 'qlnorm', 'qmultinom', 'qnbinom', 'qnorm', 'qpois', 'qt', 'qunif', 'qweibull'.

References

Hoff P (2007). 'Extending the rank likelihood for semiparametric copula estimation.' Ann. Appl. Stat, 1(1), 265-283.

Joe H (2006). “Generating random correlation matrices based on partial correlations.”Journal of Multivariate Analysis, 97, 2177-2189.

Examples

data("nhanes1518")
xmod <- mpower::MixtureModel(nhanes1518, method = "resampling")


[Package mpower version 0.1.0 Index]