PRclust {prclust} | R Documentation |
Find the Solution of Penalized Regression-Based Clustering.
Description
Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit. Prclust helps us peform penalized regression-based clustering with various loss functions and grouping penalities via two algorithm (DC-ADMM and quadratic penalty).
Usage
PRclust(data, lambda1, lambda2, tau,
loss.method = c("quadratic","lasso"),
grouping.penalty = c("gtlp","L1","SCAD","MCP"),
algorithm = c("ADMM","Quadratic"), epsilon=0.001)
Arguments
data |
input matrix, of dimension nvars x nobs; each column is an observation vector. |
lambda1 |
Tuning parameter or step size: lambda1, typically set at 1 for quadratic penalty based algorithm; 0.4 for revised ADMM. |
lambda2 |
Tuning parameter: lambda2, the magnitude of grouping penalty. |
tau |
Tuning parameter: tau, related to grouping penalty. |
loss.method |
The loss method. "lasso" stands for |
grouping.penalty |
Grouping penalty. Character: may be abbreviated. "gtlp" means generalized group lasso is used for grouping penalty. "lasso" means lasso is used for grouping penalty. "SCAD" and "MCP" are two other non-convex penalty. |
algorithm |
character: may be abbreviated. The algorithm to use for finding the solution. The default algorithm is "ADMM", which stands for the new algorithm we developed. |
epsilon |
The stopping critetion parameter. The default is 0.001. |
Details
Clustering analysis has been widely used in many fields. In the absence of a class label, clustering analysis is also called unsupervised learning. However, penalized regression-based clustering adopts a novel framework for clustering analysis by viewing it as a regression problem. In this method, a novel non-convex penalty for grouping pursuit was proposed which data-adaptively encourages the equality among some unknown subsets of parameter estimates. This new method can deal with some complex clustering situation, for example, in the presence of non-convex cluster, in which the K-means fails to work, PRclust might perform much better.
Value
The return value is a list. In this list, it contains the following matrix.
mu |
The centroid of the each observations. |
theta |
The theta value for the data set, not very useful. |
group |
The group for each points. |
count |
The iteration times. |
Note
Choosing tunning parameter is kind of time consuming job. It is always based on "trials and errors".
Author(s)
Chong Wu, Wei Pan
References
Pan, W., Shen, X., & Liu, B. (2013). Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. Journal of Machine Learning Research, 14(1), 1865-1889.
Wu, C., Kwon, S., Shen, X., & Pan, W. (2016). A New Algorithm and Theory for Penalized Regression-based Clustering. Journal of Machine Learning Research, 17(188), 1-25.
Examples
library("prclust")
# To let you have a better understanding about the power and strength
# of PRclust method, 6 examples in original prclust paper were provided.
################################################
### case 1
################################################
## generate the data
data = matrix(NA,2,100)
data[1,1:50] = rnorm(50,0,0.33)
data[2,1:50] = rnorm(50,0,0.33)
data[1,51:100] = rnorm(50,1,0.33)
data[2,51:100] = rnorm(50,1,0.33)
## set the tunning parameter
lambda1 =1
lambda2 = 3
tau = 0.5
a =PRclust(data,lambda1,lambda2,tau)
a