VIPnoSNP {RHclust} | R Documentation |
Vector in Partition without SNP data
Description
Clustering of subjects based on similar patterns of gene expression and DNA methylation.
Usage
VIPnoSNP(Simulated = NULL, CPG = NULL, GE = NULL,
CPGname = NULL, GEname = NULL, v,
optimize = c('off','min','slope','elbow'),
iter_max = 1000, nstart = 5, fit = c('aic','bic'),
seed = NULL, ct = c('mean','median'), verbose = FALSE)
Arguments
Simulated |
set to name of simulated data built from SimData(), else set to NULL for real data. |
CPG |
Data frame or data matrix containing numeric CPG data. Input must be in form of N x M, with N rows of subjects and M columns of CPG. Rownames are permitted. Run SimData()$CPG for examples. |
GE |
Data frame or data matrix containing numeric GE data. Input must be in form of N x M, with N rows of subjects and M columns of GE. Rownames are permitted. Run SimData()$GE for examples. |
CPGname |
Names for CPG data. Data must be a data frame of Nx2 dimensions with CPG sites as column 1, and GE indexes in column 2. Order of CPGs must match the order of the CPG columns in the argument GE. See SimData()$CPG_Index for examples. |
GEname |
Names for GE data. Data must be a data frame of Nx2 dimensions with GE sites as column 1, and GE indexes in column 2. Order of GEs must match the order of the GE columns in the argument GE. See SimData()$GE_Index for examples. |
v |
Numeric scalar or vector of number for clusters, or a range of clusters with format c(l,u) for cluster l:u |
optimize |
Returned the optimal number of clusters. Input 'min' returns cluster assignment with lowest WSS for clusters in v. Input 'slope' indicates whether the algorithm should pick the lowest WSS value based on the first increasing slope. Input 'elbow' fits a line between the first and last fitted WSS and finds the corresponding cluster with the maximum distance to that line. All but 'slope' return plots. |
iter_max |
Maximum number of iterations allowed. |
nstart |
If nstart > 1, repetitive computations with random initializations are computed and the result with minimum tot_dist is returned. |
fit |
Penalizing factor for WSS of clusters. Can be set to either 'aic' or 'bic'. |
seed |
Optional input to sample the same initial cluster centers. |
ct |
Central tendency option for cluster assignment. Options include 'mean' or 'median'. |
verbose |
Logical whether information about the cluster procedure should be given. |
Details
The details are outlined in the main VIP() function. The only difference in this function is the absence of SNP data.
Value
size |
Number of subjects assigned to each cluster. |
cluster |
Vector of cluster assignment. |
GECenters |
Matrix of cluster centers for GE. |
CPGCenters |
Matrix of cluster centers for CPG. |
within |
Vector of within cluster sum of squares with one component per cluster. |
tot_within |
Sumed total of within-cluster sum of squares. |
Moved |
Number of iterations before convergence. |
AIC |
Value of tot_within with aic penalizer. |
BIC |
Value of tot_within with bic penalizer. |
outputPlot |
Returns the tot_within, aic, bic, and v values for ploting. |
Author(s)
jkhndwrk@memphis.edu
References
Hartigan, J. A. and Wong, M. A. (1979). Algorithm AS 136: A K-means clustering algorithm. Applied Statistics, 28, 100–108. 10.2307/2346830.
Examples
# No SNP data
sd = SimData()
noSNPout = VIP(sd, v = c(1,5), optimize = 'off', nstart = 30, type = 'NoSNP')
noSNPout = VIPnoSNP(sd, v = c(1,5), optimize = 'off', nstart = 30)