MixtureModel {mpower} | R Documentation |
Correlated predictors generator
Description
This function creates a generative model for the correlated, mixed-scale predictors.
Usage
MixtureModel(
method = "estimation",
data = NULL,
G = NULL,
m = 100,
nudge = 1e-09,
sbg_args = list(nsamp = 1000),
cvine_marginals = list(),
cvine_dtypes = list(),
resamp_prob = NULL
)
Arguments
method |
A string, one of the three options "resampling", "estimation", or "cvine". Default is "estimation". See Details. |
data |
A dataframe or matrix, required for resampling" and "estimation" method. |
G |
A guesstimate pairwise correlation matrix for "cvine" method. See Details. |
m |
A positive number indicating uncertainty in the guesstimate G, larger means more uncertainty. Default is 100. |
nudge |
A number, default 10e-10 to add to the diagonal of the covariance matrix for numerical stability. |
sbg_args |
A list of named arguments, except Y, for function 'sbgcop.mcmc()'. |
cvine_marginals |
A named list describing the univariate distribution of each predictor. See Details. |
cvine_dtypes |
A named list describing the data type of each variable. |
resamp_prob |
A vector of sampling probability for each observation in data. Must sums to 1. |
Value
A MixtureModel object.
Details
There are three methods to generate data:
1. Resampling: if we have enough data of the predictors, we can resample to get realistic joint distributions and dependence among them.
2. Estimation: if we have a small sample from, for example, a pilot study, we can sample from a semi-parametric copula model (Hoff 2007) after learning the dependence and univariate marginals of the predictors.
3. C-vine: if no pilot data exists, we can still set rough guesstimate of the dependence and univariate marginals. The C-vine algorithm (Joe 2006) generates positive semi-definite correlation matrix given the guesstimate G. The guesstimate G is a symmetric p x p matrix whose ij-th item is between -1 and 1 and is the guesstimate correlation between predictor ith and jth. G doesn't need to be a valid correlation matrix. The method works well when values in G are not extreme (i.e., 0.999, -0.999). Built-in functions for univariate marginals include: 'qbeta' , 'qbinom', 'qcauchy', 'qchisq', 'qexp', 'qf', 'qgamma', 'qgeom', 'qhyper', 'qlogis', 'qlnorm', 'qmultinom', 'qnbinom', 'qnorm', 'qpois', 'qt', 'qunif', 'qweibull'.
References
Hoff P (2007). 'Extending the rank likelihood for semiparametric copula estimation.' Ann. Appl. Stat, 1(1), 265-283.
Joe H (2006). “Generating random correlation matrices based on partial correlations.”Journal of Multivariate Analysis, 97, 2177-2189.
Examples
data("nhanes1518")
xmod <- mpower::MixtureModel(nhanes1518, method = "resampling")