gen_data {latentcor} | R Documentation |
Mixed type simulation data generator
Description
Generates data of mixed types from the latent Gaussian copula model.
Usage
gen_data(
n = 100,
types = c("ter", "con"),
rhos = 0.5,
copulas = "no",
XP = NULL,
showplot = FALSE
)
Arguments
n |
A positive integer indicating the sample size. The default value is 100. |
types |
A vector indicating the type of each variable, could be |
rhos |
A vector with lower-triangular elements of desired correlation matrix, e.g. |
copulas |
A vector indicating the copula transformation f for each of the p variables, e.g. U = f(Z). Each element can take value |
XP |
A list of length p indicating proportion of zeros (for binary and truncated), and proportions of zeros and ones (for ternary) for each of the variables. For continuous variable, NA should be supplied. If |
showplot |
Logical indicator. If TRUE, generates the plot of the data when number of variables p is no more than 3. The default value is FALSE. |
Value
gen_data
returns a list containing
X: Generated data matrix (n by p) of observed variables.
plotX: Visualization of the data matrix X. Histogram if
p=1
. 2D Scatter plot ifp=2
. 3D scatter plot ifp=3
. Returns NULL ifshowplot = FALSE
.
References
Fan J., Liu H., Ning Y. and Zou H. (2017) "High dimensional semiparametric latent graphicalmodel for mixed data" doi:10.1111/rssb.12168.
Yoon G., Carroll R.J. and Gaynanova I. (2020) "Sparse semiparametric canonical correlation analysis for data of mixed types" doi:10.1093/biomet/asaa007.
Examples
# Generate single continuous variable with exponential transformation (always greater than 0)
# and show histogram.
simdata = gen_data(n = 100, copulas = "expo", types = "con", showplot = FALSE)
X = simdata$X; plotX = simdata$plotX
# Generate a pair of variables (ternary and continuous) with default proportions
# and without copula transformation.
simdata = gen_data()
X = simdata$X
# Generate 3 variables (binary, ternary and truncated)
# corresponding copulas for each variables are "no" (no transformation),
# "cube" (cube transformation) and "cube" (cube transformation).
# binary variable has 30% of zeros, ternary variable has 20% of zeros
# and 40% of ones, truncated variable has 50% of zeros.
# Then show the 3D scatter plot (data points project on either 0 or 1 on Axis X1;
# on 0, 1 or 2 on Axas X2; on positive domain on Axis X3)
simdata = gen_data(n = 100, rhos = c(.3, .4, .5), copulas = c("no", "cube", "cube"),
types = c("bin", "ter", "tru"), XP = list(.3, c(.2, .4), .5), showplot = TRUE)
X = simdata$X; plotX = simdata$plotX
# Check the proportion of zeros for the binary variable.
sum(simdata$X[ , 1] == 0)
# Check the proportion of zeros and ones for the ternary variable.
sum(simdata$X[ , 2] == 0); sum(simdata$X[ , 2] == 1)
# Check the proportion of zeros for the truncated variable.
sum(simdata$X[ , 3] == 0)