R: Find the Solution of Penalized Regression-Based Clustering.

PRclust {prclust}

R Documentation

Find the Solution of Penalized Regression-Based Clustering.

Description

Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit. Prclust helps us peform penalized regression-based clustering with various loss functions and grouping penalities via two algorithm (DC-ADMM and quadratic penalty).

Usage

PRclust(data, lambda1, lambda2, tau, 
	loss.method = c("quadratic","lasso"), 
	grouping.penalty = c("gtlp","L1","SCAD","MCP"), 
	algorithm = c("ADMM","Quadratic"), epsilon=0.001)

Arguments

`data`	input matrix, of dimension nvars x nobs; each column is an observation vector.
`lambda1`	Tuning parameter or step size: lambda1, typically set at 1 for quadratic penalty based algorithm; 0.4 for revised ADMM.
`lambda2`	Tuning parameter: lambda2, the magnitude of grouping penalty.
`tau`	Tuning parameter: tau, related to grouping penalty.
`loss.method`	The loss method. "lasso" stands for `L_1` loss function, while "quadratic" stands for the quadratic loss function.
`grouping.penalty`	Grouping penalty. Character: may be abbreviated. "gtlp" means generalized group lasso is used for grouping penalty. "lasso" means lasso is used for grouping penalty. "SCAD" and "MCP" are two other non-convex penalty.
`algorithm`	character: may be abbreviated. The algorithm to use for finding the solution. The default algorithm is "ADMM", which stands for the new algorithm we developed.
`epsilon`	The stopping critetion parameter. The default is 0.001.

Details

Clustering analysis has been widely used in many fields. In the absence of a class label, clustering analysis is also called unsupervised learning. However, penalized regression-based clustering adopts a novel framework for clustering analysis by viewing it as a regression problem. In this method, a novel non-convex penalty for grouping pursuit was proposed which data-adaptively encourages the equality among some unknown subsets of parameter estimates. This new method can deal with some complex clustering situation, for example, in the presence of non-convex cluster, in which the K-means fails to work, PRclust might perform much better.

Value

The return value is a list. In this list, it contains the following matrix.

`mu`	The centroid of the each observations.
`theta`	The theta value for the data set, not very useful.
`group`	The group for each points.
`count`	The iteration times.

Note

Choosing tunning parameter is kind of time consuming job. It is always based on "trials and errors".

Author(s)

Chong Wu, Wei Pan

References

Pan, W., Shen, X., & Liu, B. (2013). Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. Journal of Machine Learning Research, 14(1), 1865-1889.

Wu, C., Kwon, S., Shen, X., & Pan, W. (2016). A New Algorithm and Theory for Penalized Regression-based Clustering. Journal of Machine Learning Research, 17(188), 1-25.

Examples

library("prclust")
# To let you have a better understanding about the power and strength
# of PRclust method, 6 examples in original prclust paper were provided.
################################################
### case 1
################################################
## generate the data
data = matrix(NA,2,100)
data[1,1:50] = rnorm(50,0,0.33)
data[2,1:50] = rnorm(50,0,0.33)
data[1,51:100] = rnorm(50,1,0.33)
data[2,51:100] = rnorm(50,1,0.33)
## set the tunning parameter
lambda1 =1
lambda2 = 3
tau = 0.5
a =PRclust(data,lambda1,lambda2,tau)
a

[Package prclust version 1.3 Index]