generate_data {sRDA}R Documentation

Generate data sets for sparse multivariate analysis

Description

Generate two data sets with highly correlated and noise variables modeled in a multiple latent variable structure. The latent variables are orthogonal to each other thus capture a different portion of association between the involved data sets. Thu function generates data that can be used to verify sRDA's ability of finding the highly correlated variables accross multiple latent variables.

Usage

generate_data(nr_LVs = 1, n = 50, nr_correlated_Xs = c(5),
  nr_uncorrelated_Xs = 250, mean_reg_weights_assoc_X = c(0.7),
  sd_reg_weights_assoc_X = c(0.05), Xnoise_min = -0.3, Xnoise_max = 0.3,
  nr_correlated_Ys = c(5), nr_uncorrelated_Ys = 350,
  mean_reg_weights_assoc_Y = c(0.7), sd_reg_weights_assoc_Y = c(0.05),
  Ynoise_min = -0.3, Ynoise_max = 0.3)

Arguments

nr_LVs

The number of latent variables between the predicitve and predicted data sets. The latent variables model the association between data sets.

n

The number of observations (rows) in the data sets.

nr_correlated_Xs

Number of variables of the predictive data set that are associated with the latent variables.

nr_uncorrelated_Xs

Number of variables of the predictive data set that is not associated with the latent variables.

mean_reg_weights_assoc_X

Mean of the regression weights of the predictive varaibles that are associated with the latent variables.

sd_reg_weights_assoc_X

Standard deviation of the regression weights of the predictive varaibles that are associated with the latent variables.

Xnoise_min

The lower bound of the unifrom distribution that is used to sample the values for the regression weights of the predictive varaibles that are not associated with the latent variables.

Xnoise_max

The upper bound of the unifrom distribution that is used to sample the values for the regression weights of the predictive varaibles that are not associated with the latent variables.

nr_correlated_Ys

Number of variables of the predictive data set that are associated with the latent variables.

nr_uncorrelated_Ys

Number of variables of the predicted data set that is not associated with the latent variables.

mean_reg_weights_assoc_Y

Mean of the regression weights of the predicted varaibles that are associated with the latent variables.

sd_reg_weights_assoc_Y

Standard deviation of the regression weights of the predicted varaibles that are associated with the latent variables.

Ynoise_min

The lower bound of the unifrom distribution that is used to sample the values for the regression weights of the predicted varaibles that are not associated with the latent variables.

Ynoise_max

The upper bound of the unifrom distribution that is used to sample the values for the regression weights of the prediced varaibles that are not associated with the latent variables.

Examples

# generate data with few highly correlated variahbles
dataXY <- generate_data(nr_LVs = 2,
                           n = 250,
                           nr_correlated_Xs = c(5,20),
                           nr_uncorrelated_Xs = 250,
                           mean_reg_weights_assoc_X =
                             c(0.9,0.5),
                           sd_reg_weights_assoc_X =
                             c(0.05, 0.05),
                           Xnoise_min = -0.3,
                           Xnoise_max = 0.3,
                           nr_correlated_Ys = c(10,15),
                           nr_uncorrelated_Ys = 350,
                           mean_reg_weights_assoc_Y =
                             c(0.9,0.6),
                           sd_reg_weights_assoc_Y =
                             c(0.05, 0.05),
                           Ynoise_min = -0.3,
                           Ynoise_max = 0.3)

# seperate predictor and predicted sets
X <- dataXY$X
Y <- dataXY$Y

dim(X);dim(Y)


[Package sRDA version 1.0.0 Index]