contextCluster {clusternomics} | R Documentation |
Clusternomics: Context-dependent clustering
Description
This function fits the context-dependent clustering model to the data using Gibbs sampling. It allows the user to specify a different number of clusters on the global level, as well as on the local level.
Usage
contextCluster(datasets, clusterCounts, dataDistributions = "diagNormal",
prior = NULL, maxIter = 1000, burnin = NULL, lag = 3,
verbose = FALSE)
Arguments
datasets |
List of data matrices where each matrix represents a context-specific dataset. Each data matrix has the size N times M, where N is the number of data points and M is the dimensionality of the data. The full list of matrices has length C. The number of data points N must be the same for all data matrices. |
clusterCounts |
Number of cluster on the global level and in each context.
List with the following structure: |
dataDistributions |
Distribution of data in each dataset. Can be either a list of
length C where |
prior |
Prior distribution. If |
maxIter |
Number of iterations of the Gibbs sampling algorithm. |
burnin |
Number of burn-in iterations that will be discarded. If not specified,
the algorithm discards the first half of the |
lag |
Used for thinning the samples. |
verbose |
Print progress, by default |
Value
Returns list containing the sequence of MCMC states and the log likelihoods of the individual states.
samples |
List of assignments sampled from the posterior,
each state |
logliks |
Log likelihoods during MCMC iterations. |
DIC |
Deviance information criterion to help select the number of clusters. Lower values of DIC correspond to better-fitting models. |
Examples
# Example with simulated data (see vignette for details)
# Number of elements in each cluster
groupCounts <- c(50, 10, 40, 60)
# Centers of clusters
means <- c(-1.5,1.5)
testData <- generateTestData_2D(groupCounts, means)
datasets <- testData$data
# Fit the model
# 1. specify number of clusters
clusterCounts <- list(global=10, context=c(3,3))
# 2. Run inference
# Number of iterations is just for demonstration purposes, use
# a larger number of iterations in practice!
results <- contextCluster(datasets, clusterCounts,
maxIter = 10, burnin = 5, lag = 1,
dataDistributions = 'diagNormal',
verbose = TRUE)
# Extract results from the samples
# Final state:
state <- results$samples[[length(results$samples)]]
# 1) assignment to global clusters
globalAssgn <- state$Global
# 2) context-specific assignmnets- assignment in specific dataset (context)
contextAssgn <- state[,"Context 1"]
# Assess the fit of the model with DIC
results$DIC