BCC.multi {BCClong}R Documentation

Compute a Bayesian Consensus Clustering model for mixed-type longitudinal data

Description

This function performs clustering on mixed-type (continuous, discrete and categorical) longitudinal markers using Bayesian consensus clustering method with MCMC sampling

Usage

BCC.multi(
  mydat,
  id,
  time,
  center = 1,
  num.cluster,
  formula,
  dist,
  alpha.common = 0,
  initials = NULL,
  sigma.sq.e.common = 1,
  hyper.par = list(delta = 1, a.star = 1, b.star = 1, aa0 = 0.001, bb0 = 0.001, cc0 =
    0.001, ww0 = 0, vv0 = 1000, dd0 = 0.001, rr0 = 4, RR0 = 3),
  c.ga.tunning = NULL,
  c.theta.tunning = NULL,
  adaptive.tunning = 0,
  tunning.freq = 20,
  initial.cluster.membership = "random",
  input.initial.local.cluster.membership = NULL,
  input.initial.global.cluster.membership = NULL,
  seed.initial = 2080,
  burn.in,
  thin,
  per,
  max.iter
)

Arguments

mydat

list of R longitudinal features (i.e., with a length of R), where R is the number of features. The data should be prepared in a long-format (each row is one time point per individual).

id

a list (with a length of R) of vectors of the study id of individuals for each feature. Single value (i.e., a length of 1) is recycled if necessary

time

a list (with a length of R) of vectors of time (or age) at which the feature measurements are recorded

center

1: center the time variable before clustering, 0: no centering

num.cluster

number of clusters K

formula

a list (with a length of R) of formula for each feature. Each formula is a twosided linear formula object describing both the fixed-effects and random effects part of the model, with the response (i.e., longitudinal feature) on the left of a ~ operator and the terms, separated by + operations, or the right. Random-effects terms are distinguished by vertical bars (|) separating expressions for design matrices from grouping factors. See formula argument from the lme4 package

dist

a character vector (with a length of R) that determines the distribution for each feature. Possible values are "gaussian" for a continuous feature, "poisson" for a discrete feature (e.g., count data) using a log link and "binomial" for a dichotomous feature (0/1) using a logit link. Single value (i.e., a length of 1) is recycled if necessary

alpha.common

1 - common alpha, 0 - separate alphas for each outcome

initials

List of initials for: zz, zz.local ga, sigma.sq.u, sigma.sq.e, Default is NULL

sigma.sq.e.common

1 - estimate common residual variance across all groups, 0 - estimate distinct residual variance, default is 1

hyper.par

hyper-parameters of the prior distributions for the model parameters. The default hyper-parameters values will result in weakly informative prior distributions.

c.ga.tunning

tuning parameter for MH algorithm (fixed effect parameters), each parameter corresponds to an outcome/marker, default value equals NULL

c.theta.tunning

tuning parameter for MH algorithm (random effect), each parameter corresponds to an outcome/marker, default value equals NULL

adaptive.tunning

adaptive tuning parameters, 1 - yes, 0 - no, default is 1

tunning.freq

tuning frequency, default is 20

initial.cluster.membership

"mixAK" or "random" or "PAM" or "input" - input initial cluster membership for local clustering, default is "random"

input.initial.local.cluster.membership

if use "input", option input.initial.cluster.membership must not be empty, default is NULL

input.initial.global.cluster.membership

input initial cluster membership for global clustering default is NULL

seed.initial

seed for initial clustering (for initial.cluster.membership = "mixAK") default is 2080

burn.in

the number of samples disgarded. This value must be smaller than max.iter.

thin

the number of thinning. For example, if thin = 10, then the MCMC chain will keep one sample every 10 iterations

per

specify how often the MCMC chain will print the iteration number

max.iter

the number of MCMC iterations.

Value

Returns a model contains clustering information

Examples

# import dataframe
filePath <- system.file("extdata", "epil.rds", package = "BCClong")
dat <- readRDS(filePath)
set.seed(20220929)
# example only, larger number of iteration required for accurate result
fit.BCC <-  BCC.multi (
       mydat = list(dat$anxiety_scale,dat$depress_scale),
       dist = c("gaussian"),
       id = list(dat$id),
       time = list(dat$time),
       formula =list(y ~ time + (1|id)),
       num.cluster = 2,
       burn.in = 3,
       thin = 1,
       per =1,
       max.iter = 8)


[Package BCClong version 1.0.2 Index]