partProb {probout} | R Documentation |
Partition outlier probabilities
Description
Assigns outlier probabilities to the partitions by fitting an exponential distribution to a nonparametric outlier statistic for simulated data or partition centroids.
Usage
partProb( simData, method = c("intrinsic","distance","logdensity","distdens",
"density"), shrink = 1, nproj = 1000, seed = NULL)
Arguments
simData |
Observations from a call to | |||||||||||||||
method |
One of the following options:
The default is to use the | |||||||||||||||
shrink |
Shrinkage parameter for outlier detection data. The offsets from
| |||||||||||||||
nproj |
If the data is multivariate or | |||||||||||||||
seed |
An optional integer argument to |
Details
"logdensity"
is generally prefered over "density"
, because
negative values that are large in magniude
of the logarithm of the density will not be
numerically distinguishable as density values.
Value
A vector of probabilities for each partition, obtained by fitting an exponential distribution to the outlier statistic.
References
C. Fraley, Estimating Outlier Probabilities for Large Datasets, 2017.
See Also
simData
,
OutlierStatistic
,
allProb
Examples
set.seed(0)
lead <- leader(faithful)
nlead <- length(lead[[1]]$partitions)
# repeat multiple times to account for randomness
ntimes <- 100
probs <- matrix( NA, nlead, ntimes)
for (i in 1:ntimes) {
probs[,i] <- partProb( simData(lead[[1]]), method = "distance")
}
# median probability for each partition
partprobs <- apply( probs, 1, median)
quantile(probs)
# plot leaders with outlier probability > .95
plot( faithful[,1], faithful[,2], pch = 16, cex = .5,
main = "red : leaders with outlier probability > .95")
out <- partprobs > .95
l <- lead[[1]]$leaders
points( faithful[l[out],1], faithful[l[out],2], pch = 8, cex = 1, col = "red")