| partProb {probout} | R Documentation |
Partition outlier probabilities
Description
Assigns outlier probabilities to the partitions by fitting an exponential distribution to a nonparametric outlier statistic for simulated data or partition centroids.
Usage
partProb( simData, method = c("intrinsic","distance","logdensity","distdens",
"density"), shrink = 1, nproj = 1000, seed = NULL)
Arguments
simData |
Observations from a call to | |||||||||||||||
method |
One of the following options:
The default is to use the | |||||||||||||||
shrink |
Shrinkage parameter for outlier detection data. The offsets from
| |||||||||||||||
nproj |
If the data is multivariate or | |||||||||||||||
seed |
An optional integer argument to |
Details
"logdensity" is generally prefered over "density", because
negative values that are large in magniude
of the logarithm of the density will not be
numerically distinguishable as density values.
Value
A vector of probabilities for each partition, obtained by fitting an exponential distribution to the outlier statistic.
References
C. Fraley, Estimating Outlier Probabilities for Large Datasets, 2017.
See Also
simData,
OutlierStatistic,
allProb
Examples
set.seed(0)
lead <- leader(faithful)
nlead <- length(lead[[1]]$partitions)
# repeat multiple times to account for randomness
ntimes <- 100
probs <- matrix( NA, nlead, ntimes)
for (i in 1:ntimes) {
probs[,i] <- partProb( simData(lead[[1]]), method = "distance")
}
# median probability for each partition
partprobs <- apply( probs, 1, median)
quantile(probs)
# plot leaders with outlier probability > .95
plot( faithful[,1], faithful[,2], pch = 16, cex = .5,
main = "red : leaders with outlier probability > .95")
out <- partprobs > .95
l <- lead[[1]]$leaders
points( faithful[l[out],1], faithful[l[out],2], pch = 8, cex = 1, col = "red")