getConsensusClustersParallel {clickstream}R Documentation

Generates an optimal set of clusters for a clickstream based on consensus clustering and with parallel computation

Description

This is an experimental function for a consensus clustering algorithm based on targeting a range of average next state probabilities derived when fitting each cluster to a markov chain. This function parallelizes k-means and fitToMarkovChain operations across computer cores, and depends on the parallel package to function.

Usage

getConsensusClustersParallel(
  trainingCLS,
  testCLS,
  maxIterations = 5,
  optimalProbMean = 0.5,
  range = 0.3,
  centresMin = 2,
  clusterCentresRange = 0,
  order = 1,
  cores = 2,
  takeHighest = FALSE,
  verbose = FALSE
)

Arguments

trainingCLS

Clickstream object with training data (this should be the data used to build the markov chain object).

testCLS

Clickstream object with test data.

maxIterations

Number of times to iterate (repeat) through the k-means clustering.

optimalProbMean

The target average probability of each next page click prediction in a 1st order markov chain.

range

The range above the optimal probability to target.

centresMin

The minimum cluster centres to evaluate.

clusterCentresRange

the additional cluster centres to evaluate.

order

The order for markov chains that will be used to evaluate each cluster.

cores

Number of cores used for clustering.

takeHighest

Determines whether to default to the highest mean next click probability, or error if the target is not reached after the given number of k-means iterations.

verbose

Should this function report extra information on progress?

Author(s)

Theo van Kraay theo.vankraay@hotmail.com

Examples

training <- c("User1,h,c,c,p,c,h,c,p,p,c,p,p,o",
              "User2,i,c,i,c,c,c,d",
              "User3,h,i,c,i,c,p,c,c,p,c,c,i,d",
              "User4,h,c,c,p,p,c,p,p,p,i,p,o",
              "User5,i,h,c,c,p,p,c,p,c,d",
              "User6,i,h,c,c,p,p,c,p,c,o",
              "User7,i,h,c,c,p,p,c,p,c,d",
              "User8,i,h,c,c,p,p,c,p,c,d,o")

test <- c(
    "User1,h,c,c,p,c,h,c,p,p,c,p,p,o",
    "User2,i,c,i,c,c,c,d",
    "User3,h,i,c,i,c,p,c,c,p,c,c,i,d"
)

trainingCLS <- as.clickstreams(training, header = TRUE)
testCLS <- as.clickstreams(test, header = TRUE)

clusters <- getConsensusClustersParallel(trainingCLS, testCLS, maxIterations=3, 
                                 optimalProbMean=0.40, range = 0.70, centresMin = 2, 
                                 clusterCentresRange = 0, order = 1, cores = 1,
                                 takeHighest = FALSE, verbose = FALSE)
markovchains <- fitMarkovChains(clusters)
startPattern <- new("Pattern", sequence = c("i", "h", "c", "p"))
mc <- getOptimalMarkovChain(startPattern, markovchains, clusters)
predict(mc, startPattern)

[Package clickstream version 1.3.3 Index]