clustEff {clustEff} R Documentation

## Cluster Effects Algorithm

### Description

This function implements the algorithm to cluster curves of effects obtained from a quantile regression (qrcm; Frumento and Bottai, 2015) in which the coefficients are described by flexible parametric functions of the order of the quantile. This algorithm can be also used for clustering of curves observed in time, as in functional data analysis.

### Usage

clustEff(Beta, Beta.lower = NULL, Beta.upper = NULL,
k = c(2, min(5, (ncol(Beta)-1))), ask = FALSE, diss.mat, alpha = .5,
step = c("both", "shape", "distance"),
cut.method = c("mindist", "length", "conf.int"),
method = "ward.D2", approx.spline = FALSE, nbasis = 50,
conf.level = 0.9, stand = FALSE, plot = TRUE, trace = TRUE)


### Arguments

 Beta A matrix n x q. q represents the number of curves to cluster and n is either the length of percentiles used in the quantile regression or the length of the time vector. Beta.lower A matrix n x q. q represents the number of lower interval of the curves to cluster and n the length of percentiles used in quantile regression. Used only if cluster.effects=TRUE. Beta.upper A matrix n x q. q represents the number of upper interval of the curves to cluster and n the length of percentiles used in quantile regression. Used only if cluster.effects=TRUE. k It represents the number of clusters to look for. If it is two-length vector (k.min - k.max) an optimization is performed, if it is a unique value it is fixed. ask If TRUE, after plotting the dendrogram, the user make is own choice about how many cluster to use. diss.mat a dissimilarity matrix, obtained by using distshape function. alpha It is the alpha-percentile used for computing the dissimilarity matrix. The default value is alpha=.5. step The steps used in computing the dissimilarity matrix. Default is "both"=("shape" and "distance") cut.method The method used in optimization step to look for the optimal number of clusters. Default is "mindist", however if Beta.lower and Beta.upper are available the suggested method is "conf.int". method The agglomeration method to be used. approx.spline If TRUE, Beta is approximated by a smooth spline. nbasis An integer variable specifying the number of basis functions. Only when approx.spline=TRUE conf.level the confidence level required. stand If TRUE, the argument Beta is standardized. plot If TRUE, dendrogram, boxplot and clusters are plotted. trace If TRUE, some informations are printed.

### Details

Quantile regression models conditional quantiles of a response variabile, given a set of covariates. Assume that each coefficient can be expressed as a parametric function of p in the form:

\beta(p | \theta) = \theta_{0} + \theta_1 b_1(p) + \theta_2 b_2(p) + \ldots

where b_1(p), b_2(p, \ldots) are known functions of p.

### Value

An object of class “clustEff”, a list containing the following items:

 call the matched call. p The percentiles used in quantile regression coefficient modeling or the time otherwise. X The curves matrix. clusters The vector of clusters. X.mean The mean curves matrix of dimension n x k. X.mean.dist The within cluster distance from the mean curve. X.lower The lower bound matrix. X.mean.lower The mean lower bound of dimension n x k. X.upper The upper bound matrix. X.mean.upper The mean upper bound of dimension n x k. Signif.interval The matrix of dimension n x k containing the intervals in which each mean lower and upper bounds don't include the zero. k The number of selected clusters. diss.matrix The dissimilarity matrix. X.mean.diss The within cluster dissimilarity. oggSilhouette An object of class “silhouette”. oggHclust An object of class “hclust”. distance A vector of goodness measures used to select the best number of clusters. step The selected step. method The used agglomeration method. cut.method The used method to select the best number of clusters. alpha The selected alpha-percentile.

### Author(s)

Gianluca Sottile gianluca.sottile@unipa.it

### References

Sottile, G., Adelfio, G. Clusters of effects curves in quantile regression models. Comput Stat 34, 551–569 (2019). https://doi.org/10.1007/s00180-018-0817-8

Sottile, G and Adelfio, G (2017). Clustering of effects through quantile regression. Proceedings 32nd International Workshop of Statistical Modeling, Groningen (NL), vol.2 127-130, https://iwsm2017.webhosting.rug.nl/IWSM_2017_V2.pdf.

Frumento, P., and Bottai, M. (2015). Parametric modeling of quantile regression coefficient functions. Biometrics, doi: 10.1111/biom.12410.

summary.clustEff, plot.clustEff, for summary and plotting. extract.object to extract useful objects for the clustering algorithm through a quantile regression coefficient modeling in a multivariate case.

### Examples


# CURVES EFFECTS CLUSTERING

set.seed(1234)
n <- 300
q <- 2
k <- 5
x1 <- runif(n, 0, 5)
x2 <- runif(n, 0, 5)

X <- cbind(x1, x2)
rownames(X) <- 1:n
colnames(X) <- paste0("X", 1:q)

theta1 <- matrix(c(1, 1, 0, 0, 0, .5, 0, .5, 1, 2, .5, 0, 2, 1, .5),
ncol=k, byrow=TRUE)

theta2 <- matrix(c(1, 1, 0, 0, 0, -.3, 0, .5, 1, .5, -1.5, 0, -1, -.5, 1),
ncol=k, byrow=TRUE)

theta3 <- matrix(c(1, 1, 0, 0, 0, .3, 0, -.5, -1, 2, -.5, 0, 1, -.5, -1),
ncol=k, byrow=TRUE)

rownames(theta3) <- rownames(theta2) <- rownames(theta1) <-
c("(intercept)", paste("X", 1:q, sep=""))
colnames(theta3) <- colnames(theta2) <- colnames(theta1) <-
c("(intercept)", "qnorm(p)", "p", "p^2", "p^3")

Theta <- list(theta1, theta2, theta3)

B <- function(p, k){matrix(cbind(1, qnorm(p), p, p^2, p^3), nrow=k, byrow=TRUE)}
Q <- function(p, theta, B, k, X){rowSums(X * t(theta %*% B(p, k)))}

Y <- matrix(NA, nrow(X), 15)
for(i in 1:15){
if(i <= 5) Y[, i] <- Q(runif(n), Theta[[1]], B, k, cbind(1, X))
if(i <= 10 & i > 5) Y[, i] <- Q(runif(n), Theta[[2]], B, k, cbind(1, X))
if(i <= 15 & i > 10) Y[, i] <- Q(runif(n), Theta[[3]], B, k, cbind(1, X))
}

XX <- extract.object(Y, X, intercept=TRUE, formula.p= ~ I(p) + I(p^2) + I(p^3))

obj <- clustEff(XX$X$X1, Beta.lower=XX$Xl$X1, Beta.upper=XX$Xr$X1, cut.method = "conf.int")
summary(obj)
plot(obj, xvar="clusters", col = 1:3)
plot(obj, xvar="dendrogram")
plot(obj, xvar="boxplot")

obj2 <- clustEff(XX$X$X2, Beta.lower=XX$Xl$X2, Beta.upper=XX$Xr$X2, cut.method = "conf.int")
summary(obj2)
plot(obj2, xvar="clusters", col=1:3)
plot(obj2, xvar="dendrogram")
plot(obj2, xvar="boxplot")

## Not run:
set.seed(1234)
n <- 300
q <- 15
k <- 5
X <- matrix(rnorm(n*q), n, q); X <- scale(X)
rownames(X) <- 1:n
colnames(X) <- paste0("X", 1:q)

Theta <- matrix(c(1, 1, 0, 0, 0,
.5, 0, .5, 1, 1,
.5, 0, 1, 2, .5,
.5, 0, 1, 1, .5,
.5, 0, .5, 1, 1,
.5, 0, .5, 1, .5,
-1.5, 0, -.5, 1, 1,
-1, 0, .5, -1, -1,
-.5, 0, -.5, -1, .5,
-1, 0, .5, -1, -.5,
-1.5, 0, -.5, -1, -.5,
2, 0, 1, 1.5, 2,
2, 0, .5, 1.5, 2,
2.5, 0, 1, 1, 2,
1.5, 0, 1.5, 1, 2,
3, 0, 2, 1, .5),
ncol=k, byrow=TRUE)
rownames(Theta) <- c("(intercept)", paste("X", 1:q, sep=""))
colnames(Theta) <- c("(intercept)", "qnorm(p)", "p", "p^2", "p^3")

B <- function(p, k){matrix(cbind(1, qnorm(p), p, p^2, p^3), nrow=k, byrow=TRUE)}
Q <- function(p, theta, B, k, X){rowSums(X * t(theta %*% B(p, k)))}

s <- matrix(1, q+1, k)
s[2:(q+1), 2] <- 0
s[1, 3:k] <- 0

Y <- Q(runif(n), Theta, B, k, cbind(1, X))
XX <- extract.object(Y, X, intercept = TRUE, formula.p= ~ I(p) + I(p^2) + I(p^3))

obj3 <- clustEff(XX$X, Beta.lower=XX$Xl, Beta.upper=XX$Xr, cut.method = "conf.int") summary(obj3) # changing the alpha-percentile clusters are correctly identified obj4 <- clustEff(XX$X, Beta.lower=XX$Xl, Beta.upper=XX$Xr, cut.method = "conf.int",
alpha = 0.25)
summary(obj4)

# CURVES CLUSTERING IN FUNCTIONAL DATA ANALYSIS

set.seed(1234)
n <- 300
x <- 1:n/n

Y <- matrix(0, n, 30)

sigma2 <- 4*pmax(x-.2, 0) - 8*pmax(x-.5, 0) + 4*pmax(x-.8, 0)

mu <- sin(3*pi*x)
for(i in 1:10) Y[, i] <- mu + rnorm(length(x), 0, pmax(sigma2, 0))

mu <- cos(3*pi*x)
for(i in 11:23) Y[,i] <- mu + rnorm(length(x), 0, pmax(sigma2,0))

mu <- sin(3*pi*x)*cos(pi*x)
for(i in 24:28) Y[, i] <- mu + rnorm(length(x), 0, pmax(sigma2, 0))

mu <- 0 #sin(1/3*pi*x)*cos(2*pi*x)
for(i in 29:30) Y[, i] <- mu + rnorm(length(x), 0, pmax(sigma2, 0))

obj5 <- clustEff(Y)
summary(obj5)
plot(obj5, xvar="clusters", col=1:4)
plot(obj5, xvar="dendrogram")
plot(obj5, xvar="boxplot")

## End(Not run)



[Package clustEff version 0.2.0 Index]