R: Compute error components of k-NN imputations

errorStats {yaImpute}

R Documentation

Compute error components of k-NN imputations

Description

Error properties of estimates derived from imputation differ from those of regression-based estimates because the two methods include a different mix of error components. This function computes a partitioning of error statistics as proposed by Stage and Crookston (2007).

Usage

errorStats(mahal,...,scale=FALSE,pzero=0.1,plg=0.5,seeMethod="lm")

Arguments

`mahal`	An object of class `yai` computed with `method="mahalanobis"`.
`...`	Other objects of class `yai` for which statistics are desired. All objects should be for the same data and variables used for the first argument.
`scale`	When `TRUE`, the errors are scaled by their respective standard deviations.
`pzero`	The lower tail p-value used to pick reference observations that are zero distance from each other (used to compute `rmmsd0`).
`plg`	The upper tail p-value used to pick reference observations that are substantially distant from each other (used to compute `rmsdlg`).
`seeMethod`	Method used to compute `SEE`: `seeMethod="lm"` uses `lm` and `seeMethod="gam"` uses `gam`. In both cases, the model formula is a simple linear combination of the X-variables.

Details

See https://academic.oup.com/forestscience/article/53/1/62/4604364

Value

A list that contains several data frames. The column names of each are a combination of the name of the object used to compute the statistics and the name of the statistic. The rownames correspond the the Y-variables from the first argument. The data frame names are as follows:

`common`	statistics used to compute other statistics.
`name of first argument`	error statistics for the first `yai` object.
`names of ... arguments`	error statistics for each of the remaining `yai` objects, if any.
`see`	standard error of estimate for individual regressions fit for corresponding Y-variables.
`rmmsd0`	root mean square difference for imputations based on `method="mahalanobis"` (always based on the first argument to the function).
`mlf`	square root of the model lack of fit: `sqrt(see^2 - (rmmsd0^2/2))`.
`rmsd`	root mean square error.
`rmsdlg`	root mean square error of the observations with larger distances.
`sei`	standard error of imputation `sqrt(rmsd^2 - (rmmsd0^2/2))`.
`dstc`	distance component: `sqrt(rmsd^2 - rmmsd0^2)`.

Note that unlike Stage and Crookston (2007), all statistics reported here are in the natural units, not squared units.

Author(s)

Nicholas L. Crookston ncrookston.fs@gmail.com
Albert R. Stage

References

Stage, A.R.; Crookston, N.L. (2007). Partitioning error components for accuracy-assessment of near neighbor methods of imputation. For. Sci. 53(1):62-72. https://academic.oup.com/forestscience/article/53/1/62/4604364

Examples


require (yaImpute)

data(TallyLake)

diag(cov(TallyLake[,1:8])) # see col A in Table 3 in Stage and Crookston

mal=yai(x=TallyLake[,9:29],y=TallyLake[,1:8],
        noTrgs=TRUE,method="mahalanobis")


msn=yai(x=TallyLake[,9:29],y=TallyLake[,1:8],
        noTrgs=TRUE,method="msn")


# variable "see" for "mal" matches col B (when squared and scaled)
# other columns don't match exactly as Stage and Crookston used different
# software to compute values

errorStats(mal,msn)

[Package yaImpute version 1.0-34 Index]