simul.dedic.facmod {BayesFM}R Documentation

Generate synthetic data from a dedicated factor model


This function simulates data from a dedicated factor model. The parameters of the model are either passed by the user or simulated by the function.


simul.dedic.facmod(N, dedic, alpha, sigma, R, R.corr = TRUE,
  max.corr = 0.85, R.max.trial = 1000)



Number of observations in data set.


Vector of indicators. The number of manifest variables is equal to the length of this vector, and the number of factors is equal to the number of unique nonzero elements. Each integer element indicates on which latent factor the corresponding variable loads uniquely.


Vector of factor loadings, should be of same length as dedic. If missing, values are simulated (see details below).


Idiosyncratic variances, should be of same length as dedic. If missing, values are simulated (see details below).


Covariance matrix of the latent factors. If missing, values are simulated (see details below).


If TRUE, covariance matrix R is rescaled to be a correlation matrix.


Maximum correlation allowed between the latent factors.


Maximum number of trials allowed to sample from the truncated distribution of the covariance matrix of the latent factors (accept/reject sampling scheme, to make sure max.corr is not exceeded).


The function simulates data from the following dedicated factor model, for i=1,...,Ni = 1, ..., N:

Yi=αθi+ϵiY_i = \alpha \theta_i + \epsilon_i

θiN(0,R)\theta_i \sim \mathcal{N}(0, R)

ϵiN(0,Σ)\epsilon_i \sim \mathcal{N}(0, \Sigma)

where the KK-vector θi\theta_i contains the latent factors, and α\alpha is the (M×K)(M \times K)-matrix of factor loadings. Each row mm of α\alpha contains only zeros, besides its element indicated by the mmth element of dedic that is equal to the mmth element of alpha (denoted αmΔ\alpha_m^\Delta below). The MM-vector ϵi\epsilon_i is the vector of error terms, with Σ=diag(\Sigma = diag(sigma)). MM is equal to the length of the vector dedic, and KK is equal to the maximum value of this vector.

Only N and dedic are required, all the other parameters can be missing, completely or partially. Missing values (NA) are independently sampled from the following distributions, for each manifest variable m=1,...,Mm = 1, ..., M:

Factor loadings:

αmΔ=(1)ϕmam\alpha_m^\Delta = (-1)^{\phi_m}\sqrt{a_m}

ϕmBer(0.5)\phi_m \sim \mathcal{B}er(0.5)

amUnif(0.04,0.64)a_m \sim \mathcal{U}nif (0.04, 0.64)

Idiosyncratic variances:

σm2Unif(0.2,0.8)\sigma^2_m \sim \mathcal{U}nif (0.2, 0.8)

For the variables that do not load on any factors (i.e., for which the corresponding elements of dedic are equal to 0), it is specified that αmΔ=0\alpha_m^\Delta = 0 and σm2=1\sigma^2_m = 1.

Covariance matrix of the latent factors:

ΩInvWishart(K+5,IK)\Omega \sim \mathcal{I}nv-\mathcal{W}ishart(K+5, I_K)

which is rescaled to be a correlation matrix if R.corr = TRUE:

R=Λ1/2ΩΛ1/2R = \Lambda^{-1/2} \Omega \Lambda^{-1/2}

Λ=diag(Ω)\Lambda = diag(\Omega)

Note that the distribution of the covariance matrix is truncated such that all the off-diagonal elements of the implied correlation matrix RR are below max.corr in absolute value. The truncation is also applied if the covariance matrix is used instead of the correlation matrix (i.e., if R.corr = FALSE).

The distributions and the corresponding default values used to simulate the model parameters are specified as in the Monte Carlo study of CFSHP, see section 4.1 (p.43).


The function returns a data frame with N observations simulated from the corresponding dedicated factor model. The parameters used to generate the data are saved as attributes: dedic, alpha, sigma and R.


Rémi Piatek


G. Conti, S. Frühwirth-Schnatter, J.J. Heckman, R. Piatek (2014): “Bayesian Exploratory Factor Analysis”, Journal of Econometrics, 183(1), pages 31-57, doi:10.1016/j.jeconom.2014.06.008.


# generate 1000 observations from model with 4 factors and 20 variables
# (5 variables loading on each factor)
dat <- simul.dedic.facmod(N = 1000, dedic = rep(1:4, each = 5))

# generate data set with 5000 observations from the following model:
dedic <- rep(1:3, each = 4)        # 3 factors and 12 manifest variables
alpha <- rep(c(1, NA, NA, NA), 3)  # set first loading to 1 for each factor,
                                   #   sample remaining loadings from default
sigma <- rep(0.5, 12)              # idiosyncratic variances all set to 0.5
R <- toeplitz(c(1, .6, .3))        # Toeplitz matrix
dat <- simul.dedic.facmod(N = 5000, dedic, alpha, sigma, R)

[Package BayesFM version 0.1.7 Index]