runBatchMix {batchmix} | R Documentation |
Run Batch Mixture Model
Description
Runs a MCMC chain for a Bayesian mixture model which models both batch effects and class/cluster structure.
Usage
runBatchMix(
X,
R,
thin,
batch_vec,
type,
K_max = NULL,
initial_labels = NULL,
fixed = NULL,
alpha = 1,
mu_proposal_window = 0.5^2,
cov_proposal_window = 0.002,
m_proposal_window = 0.3^2,
S_proposal_window = 0.01,
t_df_proposal_window = 0.015,
m_scale = NULL,
rho = 3,
theta = 1,
initial_class_means = NULL,
initial_class_covariance = NULL,
initial_batch_shift = NULL,
initial_batch_scale = NULL,
initial_class_df = NULL,
verbose = TRUE
)
Arguments
X |
Data to cluster as a matrix with the items to cluster held in rows. |
R |
The number of iterations in the sampler. |
thin |
The factor by which the samples generated are thinned, e.g. if “thin=50“ only every 50th sample is kept. |
batch_vec |
Labels identifying which batch each item being clustered is from. |
type |
Character indicating density type to use. One of 'MVN' (multivariate normal distribution) or 'MVT' (multivariate t distribution). |
K_max |
The number of components to include (the upper bound on the number of clusters in each sample). Defaults to the number of unique labels in “initial_labels“. |
initial_labels |
Initial clustering. |
fixed |
Which items are fixed in their initial label. If not given, defaults to a vector of 0 meaning the model is run unsupervised. |
alpha |
The concentration parameter for the stick-breaking prior and the weights in the model. |
mu_proposal_window |
The proposal window for the cluster mean proposal kernel. Making this smaller will normally increase the acceptance rate for the proposed values in the Metropolis-Hastings sampler. The proposal density is a Gaussian distribution, the window is the variance. |
cov_proposal_window |
The proposal window for the cluster covariance proposal kernel. The proposal density is a Wishart distribution, this argument is the reciprocal of the degree of freedom. It is recommended to set this aiming for accpetance rates of greater than 0.5 for the covariance matrices (e.g., between 2e-03 and 1e-04 is a good range to consider initially). As the entire covariance matrix is sampled at once exploration is difficult. |
m_proposal_window |
The proposal window for the batch mean proposal kernel. The proposal density is a Gaussian distribution, the window is the variance. |
S_proposal_window |
The proposal window for the batch standard deviation proposal kernel. The proposal density is a Gamma distribution, this argument is the reciprocal of the rate. Recommended range to initially consider is 0.015 to 2e-03, though smaller values might be necessary particularly in higher dimensional data. |
t_df_proposal_window |
The proposal window for the degrees of freedom for the multivariate t distribution (not used if type is not 'MVT'). The proposal density is a Gamma distribution, this argument is the reciprocal of the rate. If the data is more Gaussian than the degrees of freedom might have high acceptance rates regardless of the value chosen. |
m_scale |
The scale hyperparameter for the batch shift prior distribution. This defines the scale of the batch effect upon the mean and should be in (0, 1]. |
rho |
The shape of the prior distribution for the batch scale. |
theta |
The scale of the prior distribution for the batch scale. |
initial_class_means |
A $P x K$ matrix of initial values for the class means. Defaults to draws from the prior distribution. |
initial_class_covariance |
A $P x P x K$ array of initial values for the class covariance matrices. Defaults to draws from the prior distribution. |
initial_batch_shift |
A $P x B$ matrix of initial values for the batch shift effect Defaults to draws from the prior distribution. |
initial_batch_scale |
A $P x B$ matrix of initial values for the batch scales Defaults to draws from the prior distribution. |
initial_class_df |
A $K$ vector of initial values for the class degrees of freedom. Defaults to draws from the prior distribution. |
verbose |
Logiccal indicating if warning about proposal windows should be printed. |
Value
A named list containing the sampled partitions, cluster and batch parameters, model fit measures and some details on the model call.
Examples
# Data in a matrix format
X <- matrix(c(rnorm(100, 0, 1), rnorm(100, 3, 1)), ncol = 2, byrow = TRUE)
# Initial labelling
labels <- c(
rep(1, 10),
sample(c(1, 2), size = 40, replace = TRUE),
rep(2, 10),
sample(c(1, 2), size = 40, replace = TRUE)
)
fixed <- c(rep(1, 10), rep(0, 40), rep(1, 10), rep(0, 40))
# Batch
batch_vec <- sample(seq(1, 5), replace = TRUE, size = 100)
# Density choice
type <- "MVN"
# Sampling parameters
R <- 1000
thin <- 50
# MCMC samples
mcmc_out <- runBatchMix(
X,
R,
thin,
batch_vec,
type,
initial_labels = labels,
fixed = fixed
)
# Given an initial value for the parameters
initial_class_means <- matrix(c(1, 1, 3, 4), nrow = 2)
initial_class_covariance <- array(c(1, 0, 0, 1, 1, 0, 0, 1),
dim = c(2, 2, 2)
)
# We can use values from a previous chain
initial_batch_shift <- mcmc_out$batch_shift[, , R / thin]
initial_batch_scale <- matrix(
c(1.2, 1.3, 1.7, 1.1, 1.4, 1.3, 1.2, 1.2, 1.1, 2.0),
nrow = 2
)
mcmc_out <- runBatchMix(X,
R,
thin,
batch_vec,
type,
initial_labels = labels,
fixed = fixed,
initial_class_means = initial_class_means,
initial_class_covariance = initial_class_covariance,
initial_batch_shift = initial_batch_shift,
initial_batch_scale = initial_batch_scale
)