scovq {ICS} | R Documentation |
Supervised scatter matrix based on quantiles
Description
Function for a supervised scatter matrix that is the weighted
covariance matrix of x
with weights 1/(q2-q1
) if y
is between the
lower (q1
) and upper (q2
) quantile and 0 otherwise (or vice versa).
Usage
scovq(x, y, q1 = 0, q2 = 0.5, pos = TRUE, type = 7,
method = "unbiased", na.action = na.fail,
check = TRUE)
Arguments
x |
numeric data matrix with at least two columns. |
y |
numerical vector specifying the dependent variable. |
q1 |
percentage for lower quantile of |
q2 |
percentage for upper quantile of |
pos |
logical. If TRUE then the weights are 1/( |
type |
passed on to function |
method |
passed on to function |
na.action |
a function which indicates what should happen when the data contain 'NA's. Default is to fail. |
check |
logical. Checks if the input should be checked for consistency. If not needed setting it to FALSE might save some time. |
Details
The weights for this supervised scatter matrix for pos=TRUE
are
w(y) = I(q1-quantile < y < q2-quantile)/(q2-q1)
. Then scovq
is calculated as
scovq = \sum w(y) (x-\bar{x}_w)'(x-\bar{x}_w).
where \bar{x}_w = \sum w(y) x
.
To see how this function can be used in the context of supervised invariant coordinate selection see the example below.
Value
a matrix.
Author(s)
Klaus Nordhausen
References
Liski, E., Nordhausen, K. and Oja, H. (2014), Supervised invariant coordinate selection, Statistics: A Journal of Theoretical and Applied Statistics, 48, 711–731. <doi:10.1080/02331888.2013.800067>.
See Also
Examples
# Creating some data
# The number of explaining variables
p <- 10
# The number of observations
n <- 400
# The error variance
sigma <- 0.5
# The explaining variables
X <- matrix(rnorm(p*n),n,p)
# The error term
epsilon <- rnorm(n, sd = sigma)
# The response
y <- X[,1]^2 + X[,2]^2*epsilon
# SICS with ics
X.centered <- sweep(X,2,colMeans(X),"-")
SICS <- ics(X.centered, S1=cov, S2=scovq, S2args=list(y=y, q1=0.25,
q2=0.75, pos=FALSE), stdKurt=FALSE, stdB="Z")
# Assuming it is known that k=2, then the two directions
# of interest are choosen as:
k <- 2
KURTS <- SICS@gKurt
KURTS.max <- ifelse(KURTS >= 1, KURTS, 1/KURTS)
ordKM <- order(KURTS.max, decreasing = TRUE)
indKM <- ordKM[1:k]
# The two variables of interest
Zk <- ics.components(SICS)[,indKM]
# The correspondings transformation matrix
Bk <- coef(SICS)[indKM,]
# The corresponding projection matrix
Pk <- t(Bk) %*% solve(Bk %*% t(Bk)) %*% Bk
# Visualization
pairs(cbind(y,Zk))
# checking the subspace difference
# true projection
B0 <- rbind(rep(c(1,0),c(1,p-1)),rep(c(0,1,0),c(1,1,p-2)))
P0 <- t(B0) %*% solve(B0 %*% t(B0)) %*% B0
# crone and crosby subspace distance measure, should be small
k - sum(diag(P0 %*% Pk))