R: Supervised scatter matrix based on quantiles

scovq {ICS}

R Documentation

Supervised scatter matrix based on quantiles

Description

Function for a supervised scatter matrix that is the weighted covariance matrix of x with weights 1/(q2-q1) if y is between the lower (q1) and upper (q2) quantile and 0 otherwise (or vice versa).

Usage

scovq(x, y, q1 = 0, q2 = 0.5, pos = TRUE, type = 7, 
      method = "unbiased", na.action = na.fail, 
      check = TRUE)

Arguments

`x`	numeric data matrix with at least two columns.
`y`	numerical vector specifying the dependent variable.
`q1`	percentage for lower quantile of `y`. With 0 <= `q1` < `q2`. See details.
`q2`	percentage for upper quantile of `y`. With `q1` < `q2` <= 1. See details.
`pos`	logical. If TRUE then the weights are 1/(`q2-q1`) if `y` is between the `q1`- and `q2`- quantiles and 0 othervise. If FALSE then the weights are 0 if `y` between `q1`- and `q2`-quantiles and 1/(`1-q2+q1`) otherwise.
`type`	passed on to function `quantile`.
`method`	passed on to function `cov.wt`.
`na.action`	a function which indicates what should happen when the data contain 'NA's. Default is to fail.
`check`	logical. Checks if the input should be checked for consistency. If not needed setting it to FALSE might save some time.

Details

The weights for this supervised scatter matrix for pos=TRUE are w(y) = I(q1-quantile < y < q2-quantile)/(q2-q1). Then scovq is calculated as

scovq = \sum w(y) (x-\bar{x}_w)'(x-\bar{x}_w).

where \bar{x}_w = \sum w(y) x.

To see how this function can be used in the context of supervised invariant coordinate selection see the example below.

Value

a matrix.

Author(s)

Klaus Nordhausen

References

Liski, E., Nordhausen, K. and Oja, H. (2014), Supervised invariant coordinate selection, Statistics: A Journal of Theoretical and Applied Statistics, 48, 711–731. <doi:10.1080/02331888.2013.800067>.

Examples

# Creating some data

# The number of explaining variables
p <- 10
# The number of observations
n <- 400
# The error variance
sigma <- 0.5
# The explaining variables 
X <- matrix(rnorm(p*n),n,p)
# The error term
epsilon <- rnorm(n, sd = sigma)
# The response
y <- X[,1]^2 + X[,2]^2*epsilon


# SICS with ics

X.centered <- sweep(X,2,colMeans(X),"-")
SICS <- ics(X.centered, S1=cov, S2=scovq, S2args=list(y=y, q1=0.25, 
        q2=0.75, pos=FALSE), stdKurt=FALSE, stdB="Z")

# Assuming it is known that k=2, then the two directions 
# of interest are choosen as:

k <- 2
KURTS <- SICS@gKurt 
KURTS.max <- ifelse(KURTS >= 1, KURTS, 1/KURTS)
ordKM <- order(KURTS.max, decreasing = TRUE)

indKM <- ordKM[1:k]

# The two variables of interest
Zk <- ics.components(SICS)[,indKM]

# The correspondings transformation matrix
Bk <- coef(SICS)[indKM,]

# The corresponding projection matrix
Pk <- t(Bk) %*% solve(Bk %*% t(Bk)) %*% Bk

# Visualization
pairs(cbind(y,Zk))

# checking the subspace difference

# true projection

B0 <- rbind(rep(c(1,0),c(1,p-1)),rep(c(0,1,0),c(1,1,p-2)))
P0 <- t(B0) %*% solve(B0 %*% t(B0)) %*% B0

# crone and crosby subspace distance measure, should be small
k - sum(diag(P0 %*% Pk))

[Package ICS version 1.4-1 Index]