R: Cross-validation of Predictive K-means Clustering

predkmeansCVest {predkmeans}

R Documentation

Cross-validation of Predictive K-means Clustering

Description

Performs cross-validation of predictive k-means clustering and cluster prediction.

Usage

predkmeansCVest(
  X,
  R,
  K,
  cv.groups = 10,
  sigma2 = 0,
  sigma2fixed = FALSE,
  scale = TRUE,
  covarnames = colnames(R),
  PCA = FALSE,
  PCAcontrol = list(covarnames = colnames(R), ncomps = 5),
  TPRS = FALSE,
  TPRScontrol = list(df = 5, xname = "x", yname = "y"),
  returnAll = FALSE,
  ...
)

predkmeansCVpred(
  object,
  X = object$X,
  R = object$R,
  method = c("ML", "MixExp", "SVM"),
  ...
)

Arguments

`X`	Outcome data
`R`	Covariates. Coerced to data frame.
`K`	Number of clusters
`cv.groups`	A list providing the cross-validation groups for splitting the data. groups for splitting the data. Alternatively, a single number giving the number of groups into which the data are randomly split. A value of '0' implies leave-one-out. Defaults to 10.
`sigma2`	starting value of sigma2. Setting `sigma2=0` and `sigma2fixed=TRUE` results in regular k-means clustering.
`sigma2fixed`	Logical indicating whether sigma2 should be held fixed. If FALSE, then sigma2 is estimated using Maximum Likelihood.
`scale`	Should the outcomes be re-scaled within each training group?
`covarnames`	Names of covariates to be included directly.
`PCA`	Logical indicator for whether PCA components should be computed from R.
`PCAcontrol`	Arguments passed to `createPCAmodelmatrix`. This includes `ncomps`.
`TPRS`	Logical indicator for whether thin-plate regression splines should be created and added to covariates.
`TPRScontrol`	Arguments passed to `createTPRSmodelmatrix`. This includes `df`.
`returnAll`	A list containing all `nStarts` solutions is included in the output.
`...`	Additional arguments passed to either `predkmeans` or the prediction method.
`object`	A `predkmeansCVest` object.
`method`	Character string indicating which prediciton method should be used. Optins are `ML`, `MixExp`, and `SVM`. See `predictML` for more information.

Details

These wrappers are designed to simplify cross-validation of a dataset. For models including thin-plate regression splines (TPRS) or principal component analysis (PCA) scores, these functions will re-evaluate the TPRS basis or PCA decomposition on each training set.

Author(s)

Joshua Keller

Examples

n <- 200
r1 <- rnorm(n)
r2 <- rnorm(n)
u1 <- rbinom(n, size=1,prob=0)
cluster <- ifelse(r1<0, ifelse(u1, "A", "B"), ifelse(r2<0, "C", "D"))
mu1 <- c(A=2, B=2, C=-2, D=-2)
mu2 <- c(A=1, B=-1, C=-1, D=-1)
x1 <- rnorm(n, mu1[cluster], 4)
x2 <- rnorm(n, mu2[cluster], 4)
R <- model.matrix(~r1 + r2)
X <- cbind(x1, x2)
pkmcv <- predkmeansCVest(X=cbind(x1, x2),
                         R=R, K=4, nStarts=4, cv.groups= 5,
                         TPRS=FALSE, PCA=FALSE, covarnames=colnames(R))
pkmcv

[Package predkmeans version 0.1.1 Index]