mcem_parameter_setup {DGP4LCF} | R Documentation |
Parameters' setup and initial value assignment for the Monte Carlo Expectation Maximization (MCEM) algorithm.
Description
This function is used to create R objects storing parameters in the desired format, and assign initial values so that they are ready to use in the MCEM algorithm.
Usage
mcem_parameter_setup(
p,
k,
n,
q,
ind_num = 10,
burn_in_prop = 0.2,
thin_step = 5,
prior_sparsity = 0.1,
em_num = 50,
obs_time_num,
obs_time_index,
a_person,
col_person_index,
y_init,
a_init,
z_init,
phi_init,
a_full,
train_index,
x,
model_dgp = TRUE
)
Arguments
p |
A numeric scalar. Number of genes. |
k |
A numeric scalar. Number of latent factors. |
n |
A numeric scalar. Number of subjects. |
q |
A numeric scalar. Complete number of time points in the training data. |
ind_num |
A numeric scalar. Starting size of approximately independent samples for MCEM. |
burn_in_prop |
A numeric scalar. Proportion of burnin, which be used to calculate size of Monte Carlo samples needed in the Gibbs sampler. Must be the same as that in the function 'mcem_algorithm_irregular_time'. |
thin_step |
A numeric scalar. Thinning step, which be used to calculate size of Monte Carlo samples needed in the Gibbs sampler. Must be the same as that in the function 'mcem_algorithm_irregular_time'. |
prior_sparsity |
A numeric scalar. Prior expected proportion of genes involved within each pathway. |
em_num |
A numeric scalar. Maximum iterations of the expectation maximization (EM) algorithm allowed. |
obs_time_num |
A n-dimensional vector. One element represents one person's observed number of time points in the training data. |
obs_time_index |
A list of n elements. One element is a vector of observed time indexes for one person in the training data, sorted from early to late. |
a_person |
A list of n elements. One element is a vector of observed time for one subject in the training data, sorted from early to late. |
col_person_index |
A list of n elements. One element is a vector of column indexes for one subject in y_init. |
y_init |
A matrix of dimension (k, sum(obs_time_num)). Initial values of the latent factor score. Can be obtained using BFRM software. |
a_init |
A matrix of dimension (p, k). Initial values of the regression coefficients of factor loadings. Can be obtained using BFRM software. |
z_init |
A matrix of dimension (p, k). Initials values of the binary variables of factor loadings. Can be obtained using BFRM software. |
phi_init |
A p-dimensional column vector. Initials values of the variance for residuals when modeling gene expressions, corresponding to |
a_full |
A numeric vector. Complete time observed, sorted from early to late. |
train_index |
A q-dimensional column vector. Index of time points used in the training data. |
x |
A list of n elements. Each element is a matrix of dimension (p, q_i), storing the gene expressions for the ith subject. |
model_dgp |
A logical value. model_dgp = TRUE (default setting) uses the Dependent Gaussian Process to model latent factor trajectories, otherwise the Independent Gaussian Process is used. |
Details
The following parameters are worth particular attention, and users should tune these parameters according to the specific data.
'burn_in_prop' and 'thin_step' co-control the number of Gibbs samples needed in order to generate approximately 'ind_num' independent samples. The ultimate purpose of tuning these two parameters is to generate high-quality posterior samples for latent factor scores. Therefore: if initials of the Gibbs sampler are not good, readers may need to increase 'burn_in_prop' to discard more burn-in samples; if high-correlation is a potential concern, 'thin_step' may need to be larger.
Value
A list of R objects required in the MCEM algorithm.
Examples
# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")