R: Implementation of the linear discriminant function for...

LDA.boost {GUEST}

R Documentation

Implementation of the linear discriminant function for multi-label classification.

Description

This function applies the linear discriminant function to do classification for multi-label responses. The precision matrix, or the inverse of the covariance matrix, in the linear discriminant function can be estimated by w in the function boost.graph. In addition, error-prone covariates in the linear discriminant function are addressed by the regression calibration.

Usage

LDA.boost(data, resp, theta, sigma_e = 0.6,q = 0.8,lambda = 1, pi = 0.5)

Arguments

`data`	An n (observations) times p (variables) matrix of random variables, whose distributions can be continuous, discrete, or mixed.
`resp`	An n-dimensional vector of categorical random variables, which is the response in the data.
`theta`	The estimator of the precision matrix.
`sigma_e`	The common value in the diagonal covariance matrix of the error for the classical measurement error model when `data` are continuous. The default value is 0.6.
`q`	The common value used to characterize misclassification for binary random variables. The default value is 0.8.
`lambda`	The parameter of the Poisson distribution, which is used to characterize error-prone count random variables. The default value is 1.
`pi`	The probability in the Binomial distribution, which is used to characterize error-prone count random variables. The default value is 0.5.

Details

The linear discriminant function used is as follow:

\code{score}_{i,j} = \log (\pi _i) - 0.5\ \mu_{i}^\top\ \code{theta}\ \mu _{i} + \code{data}_{j}^\top\ \code{theta}\ \mu_{i},

for the class i = 1, \cdots, I with I being the number of classes in the dataset and subject j = 1, \cdots, n, where \pi _i is the proportion of subjects in the class i, \code{data}_{j} is the vector of covariates for the subject j, \code{theta} is the precision matrix of the covariates, and \mu_{i} is the empirical mean vector of the random variables in the class i.

Value

`score`	The value of the linear discriminant function (see details) with the estimator of the precision matrix accommodated.
`class`	The result of predicted class for subjects.

Author(s)

Hui-Shan Tsao and Li-Pang Chen
Maintainer: Hui-Shan Tsao n410412@gmail.com

References

Hui-Shan Tsao (2024). Estimation of Ultrahigh-Dimensional Graphical Models and Its Application to Dsicriminant Analysis. Master Thesis supervised by Li-Pang Chen, National Chengchi University.

Examples

data(MedulloblastomaData)

X <- t(MedulloblastomaData[2:655,]) #covariates
Y <- MedulloblastomaData[1,] #response

X <- matrix(as.numeric(X),nrow=23)

p <- ncol(X)
n <- nrow(X)

#standarization
X_new=data.frame()
for (i in 1:p){
 X_new[1:n,i]=(X[,i]-rep(mean(X[,i]),n))/sd(X[,i])
}
X_new=matrix(unlist(X_new),nrow = n)


#estimate graphical model
result <- boost.graph(data = X_new, thre = 0.2, ite1 = 3, ite2 = 0, ite3 = 0, rep = 1)
theta.hat <- result$w

theta.hat[which(theta.hat<0.8)]=0 #keep the highly dependent pairs

#predict
pre <- LDA.boost(data = X_new, resp = Y, theta = theta.hat)
estimated_Y <- pre$class

[Package GUEST version 0.2.0 Index]