KFOCI {KPC} | R Documentation |
Kernel Feature Ordering by Conditional Independence
Description
Variable selection with KPC using directed K-NN graph or minimum spanning tree (MST)
Usage
KFOCI(
Y,
X,
k = kernlab::rbfdot(1/(2 * stats::median(stats::dist(Y))^2)),
Knn = min(ceiling(NROW(Y)/20), 20),
num_features = NULL,
stop = TRUE,
numCores = parallel::detectCores(),
verbose = FALSE
)
Arguments
Y |
a matrix of responses (n by dy) |
X |
a matrix of predictors (n by dx) |
k |
a function |
Knn |
a positive integer indicating the number of nearest neighbor; or "MST". The suggested choice of Knn is 0.05n for samples up to a few hundred observations. For large n, the suggested Knn is sublinear in n. That is, it may grow slower than any linear function of n. The computing time is approximately linear in Knn. A smaller Knn takes less time. |
num_features |
the number of variables to be selected, cannot be larger than dx. The default value is NULL and in that
case it will be set equal to dx. If |
stop |
If |
numCores |
number of cores that are going to be used for parallelizing the process. |
verbose |
whether to print each selected variables during the forward stepwise algorithm |
Details
A stepwise forward selection of variables using KPC. At each step it selects the X_j
that maximizes
\hat{\rho^2}(Y,X_j |
selected X_i)
.
It is suggested to normalize the predictors before applying KFOCI.
Euclidean distance is used for computing the K-NN graph and the MST.
Value
The algorithm returns a vector of the indices from 1,...,dx of the selected variables in the same order that they were selected. The variables at the front are expected to be more informative in predicting Y.
See Also
Examples
n = 200
p = 10
X = matrix(rnorm(n * p), ncol = p)
Y = X[, 1] * X[, 2] + sin(X[, 1] * X[, 3])
KFOCI(Y, X, kernlab::rbfdot(1), Knn=1, numCores=1)
# 1 2 3