sd_sis {semidist}R Documentation

Feature screening via semi-distance correlation

Description

Implement the (grouped) feature screening for the classification problem via semi-distance correlation.

Usage

sd_sis(X, y, group_info = NULL, d = NULL, parallel = FALSE)

Arguments

X

Data of multivariate covariates, which should be an n-by-p matrix.

y

Data of categorical response, which should be a factor of length n.

group_info

A list specifying the group information, with elements being sets of indicies of covariates in a same group. For example, list(c(1, 2, 3), c(4, 5)) specifies that covariates 1, 2, 3 are in a group and covariates 4, 5 are in another group.

Defaults to NULL. If NULL, then it will be set as list(1, 2, ..., p), that is, treat each single covariate as a group.

If X has colnames, then the colnames can be used to specified the group_info. For example, list(c("a", "b"), c("c", "d")).

The names of the list can help recoginize the group. For example, list(grp_ab = c("a", "b"), grp_cd = c("c", "d")). If names of the list are not specified, c("Grp 1", "Grp 2", ..., "Grp J") will be applied.

d

An integer specifying at least how many (single) features should be kept after screening. For example, if group_info = list(c(1, 2), c(3, 4)) and d = 3, then all features 1, 2, 3, 4 must be selected since it should guarantee at least 3 features are kept.

Defaults to NULL. If NULL, then it will be set as [n / log(n)], where [x] denotes the integer part of x.

parallel

A boolean indicating whether to calculate parallelly via furrr::future_map. Defaults to FALSE.

Value

A list of the objects about the implemented feature screening:

See Also

sdcor() for calculating the sample semi-distance correlation.

Examples

X <- mtcars[, c("mpg", "disp", "hp", "drat", "wt", "qsec")]
y <- factor(mtcars[, "am"])

sd_sis(X, y, d = 4)

# Suppose we have prior information for the group structure as
# ("mpg", "drat"), ("disp", "hp") and ("wt", "qsec")
group_info <- list(
  mpg_drat = c("mpg", "drat"),
  disp_hp = c("disp", "hp"),
  wt_qsec = c("wt", "qsec")
)
sd_sis(X, y, group_info, d = 4)


[Package semidist version 0.1.0 Index]