batchSemiSupervisedMixtureModel {batchmix} | R Documentation |
Batch semisupervised mixture model
Description
A Bayesian mixture model with batch effects.
Usage
batchSemiSupervisedMixtureModel(
X,
R,
thin,
initial_labels,
fixed,
batch_vec,
type,
K_max = length(unique(initial_labels)),
alpha = NULL,
concentration = NULL,
mu_proposal_window = 0.5^2,
cov_proposal_window = 0.002,
m_proposal_window = 0.3^2,
S_proposal_window = 0.01,
t_df_proposal_window = 0.015,
m_scale = NULL,
rho = 3,
theta = 1,
initial_class_means = NULL,
initial_class_covariance = NULL,
initial_batch_shift = NULL,
initial_batch_scale = NULL,
initial_class_df = NULL,
verbose = TRUE
)
Arguments
X |
Data to cluster as a matrix with the items to cluster held in rows. |
R |
The number of iterations in the sampler. |
thin |
The factor by which the samples generated are thinned, e.g. if “thin=50“ only every 50th sample is kept. |
initial_labels |
Initial clustering. |
fixed |
Which items are fixed in their initial label. |
batch_vec |
Labels identifying which batch each item being clustered is from. |
type |
Character indicating density type to use. One of 'MVN' (multivariate normal distribution) or 'MVT' (multivariate t distribution). |
K_max |
The number of components to include (the upper bound on the number of clusters in each sample). Defaults to the number of unique labels in “initial_labels“. |
alpha |
The concentration parameter for the stick-breaking prior and the weights in the model. |
concentration |
Initial concentration vector for component weights. |
mu_proposal_window |
The proposal window for the cluster mean proposal kernel. The proposal density is a Gaussian distribution, the window is the variance. |
cov_proposal_window |
The proposal window for the cluster covariance proposal kernel. The proposal density is a Wishart distribution, this argument is the reciprocal of the degree of freedom. |
m_proposal_window |
The proposal window for the batch mean proposal kernel. The proposal density is a Gaussian distribution, the window is the variance. |
S_proposal_window |
The proposal window for the batch standard deviation proposal kernel. The proposal density is a Gamma distribution, this argument is the reciprocal of the rate. |
t_df_proposal_window |
The proposal window for the degrees of freedom for the multivariate t distribution (not used if type is not 'MVT'). The proposal density is a Gamma distribution, this argument is the reciprocal of the rate. |
m_scale |
The scale hyperparameter for the batch shift prior distribution. This defines the scale of the batch effect upon the mean and should be in (0, 1]. |
rho |
The shape of the prior distribution for the batch scale. |
theta |
The scale of the prior distribution for the batch scale. |
initial_class_means |
A $P x K$ matrix of initial values for the class means. Defaults to draws from the prior distribution. |
initial_class_covariance |
A $P x P x K$ array of initial values for the class covariance matrices. Defaults to draws from the prior distribution. |
initial_batch_shift |
A $P x B$ matrix of initial values for the batch shift effect Defaults to draws from the prior distribution. |
initial_batch_scale |
A $P x B$ matrix of initial values for the batch scales Defaults to draws from the prior distribution. |
initial_class_df |
A $K$ vector of initial values for the class degrees of freedom. Defaults to draws from the prior distribution. |
verbose |
Logiccal indicating if warning about proposal windows should be printed. |
Value
A named list containing the sampled partitions, cluster and batch parameters, model fit measures and some details on the model call.
Examples
# Data in a matrix format
X <- matrix(c(rnorm(100, 0, 1), rnorm(100, 3, 1)), ncol = 2, byrow = TRUE)
# Initial labelling
labels <- c(
rep(1, 10),
sample(c(1, 2), size = 40, replace = TRUE),
rep(2, 10),
sample(c(1, 2), size = 40, replace = TRUE)
)
fixed <- c(rep(1, 10), rep(0, 40), rep(1, 10), rep(0, 40))
# Batch
batch_vec <- sample(seq(1, 5), replace = TRUE, size = 100)
# Density choice
type <- "MVN"
# Sampling parameters
R <- 1000
thin <- 50
# MCMC samples and BIC vector
samples <- batchSemiSupervisedMixtureModel(
X,
R,
thin,
labels,
fixed,
batch_vec,
type
)
# Given an initial value for the parameters
initial_class_means <- matrix(c(1, 1, 3, 4), nrow = 2)
initial_class_covariance <- array(c(1, 0, 0, 1, 1, 0, 0, 1),
dim = c(2, 2, 2)
)
# We can use values from a previous chain
initial_batch_shift <- samples$batch_shift[, , R / thin]
initial_batch_scale <- matrix(
c(1.2, 1.3, 1.7, 1.1, 1.4, 1.3, 1.2, 1.2, 1.1, 2.0),
nrow = 2
)
samples <- batchSemiSupervisedMixtureModel(X,
R,
thin,
labels,
fixed,
batch_vec,
type,
initial_class_means = initial_class_means,
initial_class_covariance = initial_class_covariance,
initial_batch_shift = initial_batch_shift,
initial_batch_scale = initial_batch_scale
)