mcem_parameter_setup {DGP4LCF}R Documentation

Parameters' setup and initial value assignment for the Monte Carlo Expectation Maximization (MCEM) algorithm.

Description

This function is used to create R objects storing parameters in the desired format, and assign initial values so that they are ready to use in the MCEM algorithm.

Usage

mcem_parameter_setup(
  p,
  k,
  n,
  q,
  ind_num = 10,
  burn_in_prop = 0.2,
  thin_step = 5,
  prior_sparsity = 0.1,
  em_num = 50,
  obs_time_num,
  obs_time_index,
  a_person,
  col_person_index,
  y_init,
  a_init,
  z_init,
  phi_init,
  a_full,
  train_index,
  x,
  model_dgp = TRUE
)

Arguments

p

A numeric scalar. Number of genes.

k

A numeric scalar. Number of latent factors.

n

A numeric scalar. Number of subjects.

q

A numeric scalar. Complete number of time points in the training data.

ind_num

A numeric scalar. Starting size of approximately independent samples for MCEM.

burn_in_prop

A numeric scalar. Proportion of burnin, which be used to calculate size of Monte Carlo samples needed in the Gibbs sampler. Must be the same as that in the function 'mcem_algorithm_irregular_time'.

thin_step

A numeric scalar. Thinning step, which be used to calculate size of Monte Carlo samples needed in the Gibbs sampler. Must be the same as that in the function 'mcem_algorithm_irregular_time'.

prior_sparsity

A numeric scalar. Prior expected proportion of genes involved within each pathway.

em_num

A numeric scalar. Maximum iterations of the expectation maximization (EM) algorithm allowed.

obs_time_num

A n-dimensional vector. One element represents one person's observed number of time points in the training data.

obs_time_index

A list of n elements. One element is a vector of observed time indexes for one person in the training data, sorted from early to late.

a_person

A list of n elements. One element is a vector of observed time for one subject in the training data, sorted from early to late.

col_person_index

A list of n elements. One element is a vector of column indexes for one subject in y_init.

y_init

A matrix of dimension (k, sum(obs_time_num)). Initial values of the latent factor score. Can be obtained using BFRM software.

a_init

A matrix of dimension (p, k). Initial values of the regression coefficients of factor loadings. Can be obtained using BFRM software.

z_init

A matrix of dimension (p, k). Initials values of the binary variables of factor loadings. Can be obtained using BFRM software.

phi_init

A p-dimensional column vector. Initials values of the variance for residuals when modeling gene expressions, corresponding to \frac{1}{\phi^2} in the manuscript. Can be obtained using BFRM software.

a_full

A numeric vector. Complete time observed, sorted from early to late.

train_index

A q-dimensional column vector. Index of time points used in the training data.

x

A list of n elements. Each element is a matrix of dimension (p, q_i), storing the gene expressions for the ith subject.

model_dgp

A logical value. model_dgp = TRUE (default setting) uses the Dependent Gaussian Process to model latent factor trajectories, otherwise the Independent Gaussian Process is used.

Details

The following parameters are worth particular attention, and users should tune these parameters according to the specific data.

'burn_in_prop' and 'thin_step' co-control the number of Gibbs samples needed in order to generate approximately 'ind_num' independent samples. The ultimate purpose of tuning these two parameters is to generate high-quality posterior samples for latent factor scores. Therefore: if initials of the Gibbs sampler are not good, readers may need to increase 'burn_in_prop' to discard more burn-in samples; if high-correlation is a potential concern, 'thin_step' may need to be larger.

Value

A list of R objects required in the MCEM algorithm.

Examples

# See examples in vignette
vignette("bsfadgp_regular_data_example", package = "DGP4LCF")
vignette("bsfadgp_irregular_data_example", package = "DGP4LCF")


[Package DGP4LCF version 1.0.0 Index]