simul.dedic.facmod {BayesFM} | R Documentation |
Generate synthetic data from a dedicated factor model
Description
This function simulates data from a dedicated factor model. The parameters of the model are either passed by the user or simulated by the function.
Usage
simul.dedic.facmod(N, dedic, alpha, sigma, R, R.corr = TRUE,
max.corr = 0.85, R.max.trial = 1000)
Arguments
N |
Number of observations in data set. |
dedic |
Vector of indicators. The number of manifest variables is equal to the length of this vector, and the number of factors is equal to the number of unique nonzero elements. Each integer element indicates on which latent factor the corresponding variable loads uniquely. |
alpha |
Vector of factor loadings, should be of same length as |
sigma |
Idiosyncratic variances, should be of same length as |
R |
Covariance matrix of the latent factors. If missing, values are simulated (see details below). |
R.corr |
If TRUE, covariance matrix |
max.corr |
Maximum correlation allowed between the latent factors. |
R.max.trial |
Maximum number of trials allowed to sample from the truncated
distribution of the covariance matrix of the latent factors
(accept/reject sampling scheme, to make sure |
Details
The function simulates data from the following dedicated factor
model, for i = 1, ..., N
:
Y_i = \alpha \theta_i + \epsilon_i
\theta_i \sim \mathcal{N}(0, R)
\epsilon_i \sim \mathcal{N}(0, \Sigma)
where the K
-vector \theta_i
contains the latent factors, and
\alpha
is the (M \times K)
-matrix of factor loadings. Each
row m
of \alpha
contains only zeros, besides its element
indicated by the m
th element of dedic
that is equal to the
m
th element of alpha
(denoted \alpha_m^\Delta
below).
The M
-vector \epsilon_i
is the vector of error terms, with
\Sigma = diag(
sigma
)
. M
is equal to the length of
the vector dedic
, and K
is equal to the maximum value of this
vector.
Only N
and dedic
are required, all the other parameters can be
missing, completely or partially. Missing values (NA
) are
independently sampled from the following distributions, for each manifest
variable m = 1, ..., M
:
Factor loadings:
\alpha_m^\Delta = (-1)^{\phi_m}\sqrt{a_m}
\phi_m \sim \mathcal{B}er(0.5)
a_m \sim \mathcal{U}nif (0.04, 0.64)
Idiosyncratic variances:
\sigma^2_m \sim \mathcal{U}nif (0.2, 0.8)
For the variables that do not load on any factors (i.e., for which the
corresponding elements of dedic
are equal to 0), it is specified that
\alpha_m^\Delta = 0
and \sigma^2_m = 1
.
Covariance matrix of the latent factors:
\Omega \sim \mathcal{I}nv-\mathcal{W}ishart(K+5, I_K)
which is rescaled to be a correlation matrix if R.corr = TRUE
:
R = \Lambda^{-1/2} \Omega \Lambda^{-1/2}
\Lambda = diag(\Omega)
Note that the distribution of the covariance matrix is truncated such that
all the off-diagonal elements of the implied correlation matrix R
are
below max.corr
in absolute value. The truncation is also applied if
the covariance matrix is used instead of the correlation matrix (i.e., if
R.corr = FALSE
).
The distributions and the corresponding default values used to simulate the model parameters are specified as in the Monte Carlo study of CFSHP, see section 4.1 (p.43).
Value
The function returns a data frame with N
observations
simulated from the corresponding dedicated factor model.
The parameters used to generate the data are saved as attributes:
dedic
, alpha
, sigma
and R
.
Author(s)
Rémi Piatek remi.piatek@gmail.com
References
G. Conti, S. Frühwirth-Schnatter, J.J. Heckman, R. Piatek (2014): “Bayesian Exploratory Factor Analysis”, Journal of Econometrics, 183(1), pages 31-57, doi:10.1016/j.jeconom.2014.06.008.
Examples
# generate 1000 observations from model with 4 factors and 20 variables
# (5 variables loading on each factor)
dat <- simul.dedic.facmod(N = 1000, dedic = rep(1:4, each = 5))
# generate data set with 5000 observations from the following model:
dedic <- rep(1:3, each = 4) # 3 factors and 12 manifest variables
alpha <- rep(c(1, NA, NA, NA), 3) # set first loading to 1 for each factor,
# sample remaining loadings from default
sigma <- rep(0.5, 12) # idiosyncratic variances all set to 0.5
R <- toeplitz(c(1, .6, .3)) # Toeplitz matrix
dat <- simul.dedic.facmod(N = 5000, dedic, alpha, sigma, R)