e.agglo {ecp}R Documentation

ENERGY AGGLOMERATIVE

Description

An agglomerative hierarchical estimation algorithm for multiple change point analysis.

Usage

e.agglo(X, member=1:nrow(X), alpha=1, penalty=function(cps){0})

Arguments

X

A T x d matrix containing the length T time series with d-dimensional observations.

member

Initial membership vector for the time series.

alpha

Moment index used for determining the distance between and within clusters.

penalty

Function used to penalize the obtained goodness-of-fit statistics. This function takes as its input a vector of change point locations (cps).

Details

Homogeneous clusters are created based on the initial clustering provided by the member argument. In each iteration, clusters are merged so as to maximize a goodness-of-fit statistic. The computational complexity of this method is O(T^2), where T is the number of observations.

Value

Returns a list with the following components.

merged

A (T-1) x 2 matrix indicating which segments were merged at each step of the agglomerative procedure.

fit

Vector showing the progression of the penalized goodness-of-fit statistic.

progression

A T x (T+1) matrix showing the progression of the set of change points.

cluster

The estimated cluster membership vector.

estimates

The location of the estimated change points.

Author(s)

Nicholas A. James

References

Matteson D.S., James N.A. (2013). A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data.

Nicholas A. James, David S. Matteson (2014). "ecp: An R Package for Nonparametric Multiple Change Point Analysis of Multivariate Data.", "Journal of Statistical Software, 62(7), 1-25", URL "http://www.jstatsoft.org/v62/i07/"

See Also

e.divisive

Rizzo M.L., Szekely G.L. (2005). Hierarchical clustering via joint between-within distances: Extending ward's minimum variance method. Journal of Classification. pp. 151 - 183.

Rizzo M.L., Szekely G.L. (2010). Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics. pp. 1034 - 1055.

Examples

set.seed(100)
mem = rep(c(1,2,3,4),times=c(10,10,10,10))
x = as.matrix(c(rnorm(10,0,1),rnorm(20,2,1),rnorm(10,-1,1)))
y = e.agglo(X=x,member=mem,alpha=1,penalty=function(cp,Xts) 0)
y$estimates


## Not run: 
# Multivariate spatio-temporal example
# You will need the following packages:
#	mvtnorm, combinat, and MASS
library(mvtnorm); library(combinat); library(MASS)
set.seed(2013)
lambda = 1500 #overall arrival rate per unit time
muA = c(-7,-7) ; muB = c(0,0) ; muC = c(5.5,0)
covA = 25*diag(2)
covB = matrix(c(9,0,0,1),2)
covC = matrix(c(9,.9,.9,9),2)
time.interval = matrix(c(0,1,3,4.5,1,3,4.5,7),4,2)
#mixing coefficents
mixing.coef = rbind(c(1/3,1/3,1/3),c(.2,.5,.3), c(.35,.3,.35), 
	c(.2,.3,.5))
stppData = NULL
for(i in 1:4){
	count = rpois(1, lambda* diff(time.interval[i,]))
	Z = rmultz2(n = count, p = mixing.coef[i,])
	S = rbind(rmvnorm(Z[1],muA,covA), rmvnorm(Z[2],muB,covB),
		rmvnorm(Z[3],muC,covC))
	X = cbind(rep(i,count), runif(n = count, time.interval[i,1],
		time.interval[i,2]), S)
	stppData = rbind(stppData, X[order(X[,2]),])
}
member = as.numeric(cut(stppData[,2], breaks = seq(0,7,by=1/12)))
output = e.agglo(X=stppData[,3:4],member=member,alpha=1,
	penalty=function(cp,Xts) 0)

## End(Not run)

[Package ecp version 3.1.5 Index]