gfa {GFA} | R Documentation |
Gibbs sampling for group factor analysis
Description
gfa
returns posterior samples of group factor analysis model.
Usage
gfa(Y, opts, K = NULL, projection = NULL, filename = "")
Arguments
Y |
Either
NOTE: The data features should have roughly zero mean and unit variance.
If this is not the case, preprocessing with function
|
opts |
List of model options; see function |
K |
The number of components (i.e. latent variables). Recommended to be set somewhat higher than the expected component number, so that the sampler can determine the model complexity by shutting down excessive components. High values result in high CPU time. Default: half of the minimum of the sample size and total data dimensionality. |
projection |
Fixed projections. Only intended for sequential prediction
use via function |
filename |
A string. If provided, will save the sampling chain to this file every 100 iterations. Default "", inducing no saving. |
Details
GFA allows factor analysis of multiple data sources (i.e. data sets).
The priors of the model can be set to infer bicluster structure
from the data sources; see getDefaultOpts
.
Missing values (NAs) are inherently supported. They will not affect the model
parameters, but can be predicted with function reconstruction
,
based on the observed values of the corresponding sample and feature.
The association of a data source to each component is inferred based on
the data. Letting only a subset of the components to explain a data source
results in the posterior identifying relationships between any subset of the
data sources. In the extreme cases, a component can explain relationships
within a single data source only ("structured noise"), or across all the data
sources.
Value
A list containing the model parameters - in case of pairing in two modes, each element is a list of length 2; one element for each mode. For most parameters, the final posterior sample is provided to aid in initial checks; all the posterior samples should be used for model analysis. The list elements are:
W |
The loading matrix (final posterior sample); |
X |
The latent variables (final sample); |
Z |
The spike-and-slab parameters (final sample); |
r |
The probability of slab in Z (final sample). |
rz |
The probability of slab in the spike-and-slab prior of X (final sample). |
tau |
The noise precisions (final sample); D-element vector. |
alpha |
The precisions of the projection weights W (final sample);
|
beta |
The precisions of the latent variables X (final sample);
|
groups |
A list denoting which features belong to each data source. |
D |
Data dimensionalities; M-element vector. |
K |
The number of components inferred. May be less than the initial K. |
and the following elements:
posterior |
the posterior samples of, by default, X, W and tau. |
cost |
The likelihood of all the posterior samples. |
aic |
The Akaike information criterion of all the posterior samples. |
opts |
The options used for the GFA model. |
conv |
An estimate of the convergence of the model's reconstruction based on Geweke diagnostic. Values significantly above 0.05 imply a non-converged model, and hence the need for a longer sampling chain. |
time |
The CPU time (in seconds) used to sample the model. |