notablyDifferent {yaImpute} | R Documentation |
Finds observations with large differences between observed and imputed values
Description
This routine identifies observations with large errors as measured by scaled
root mean square error (see rmsd.yai
). A threshold
is used to detect observations with large differences.
Usage
notablyDifferent(object,vars=NULL,threshold=NULL,p=.05,...)
Arguments
object |
an object of class |
vars |
a vector of character strings naming the variables to use, if
null the X-variables form |
threshold |
a threshold that if exceeded the observations are listed as notably different. |
p |
|
... |
additional arguments passed to |
Details
The scaled differences are computed a follows:
A matrix of differences between observed and imputed values is computed for each observation (rows) and each variable (columns).
These differences are scaled by dividing by the standard deviation of the observed values among the reference observations.
The scaled differences are squared.
Row means are computed resulting in one value for each observation.
The square root of each of these values is taken.
These values are Euclidean distances between the target observations and their nearest references as measured using specified variables. All the variables that are used must have observed and imputed values. Generally, this will be the X-variables and not the Y-variables.
When threshold
is NULL, the function computes one using the
quantile
function with its default arguments and probs=1-p
.
Value
A named list of several items. In all cases vectors are named using the observation
ids which are the row names of the data used to build the yai
object.
call |
The call. |
vars |
The variables used (may be fewer than requested). |
threshold |
The threshold value. |
notablyDifferent.refs |
A sorted named vector of references that exceed the threshold. |
notablyDifferent.trgs |
A sorted named vector of targets that exceed the threshold. |
rmsdS.refs |
A sorted named vector of scaled RMSD references. |
rmsdS.trgs |
A sorted named vector of scaled RMSD targets. |
Author(s)
Nicholas L. Crookston ncrookston.fs@gmail.com
See Also
notablyDistant
, plot.notablyDifferent
,
yai
, grmsd
Examples
data(iris)
set.seed(12345)
# form some test data
refs=sample(rownames(iris),50)
x <- iris[,1:3] # Sepal.Length Sepal.Width Petal.Length
y <- iris[refs,4:5] # Petal.Width Species
# build an msn run, first build dummy variables for species.
sp1 <- as.integer(iris$Species=="setosa")
sp2 <- as.integer(iris$Species=="versicolor")
y2 <- data.frame(cbind(iris[,4],sp1,sp2),row.names=rownames(iris))
y2 <- y2[refs,]
names(y2) <- c("Petal.Width","Sp1","Sp2")
msn <- yai(x=x,y=y2,method="msn")
notablyDifferent(msn)