expl_boundary {BLA} | R Documentation |
Testing evidence of boundary existence in dataset
Description
This function determines the probability of having bounding effects in a scatter
plot of of x
and y
based on the clustering of points at the upper
edge of the scatter plot (Miti et al.2024). It tests the hypothesis of larger
clustering at the upper bounds of a scatter plot against a null bivariate normal
distribution with no bounding effect (random scatter at upper edges). It returns
the probability (p-value) of the observed clustering given that it a realization
of an unbounded bivariate normal distribution.
Usage
expl_boundary(x, y, shells = 10, simulations = 1000, plot = TRUE, ...)
Arguments
x |
A numeric vector of values for the independent variable. |
y |
A numeric vector of values for the response variable. |
shells |
A numeric value indicating the number of boundary peels (default is 10). |
simulations |
The number of simulations for the null bivariate normally distributed data sets used to test the hypothesis (default is 1000). |
plot |
If |
... |
Additional graphical parameters as with the |
Details
It is recommended that any outlying observations, as identified by the
bagplot()
function of the aplpack
package are removed from
the data. This is also implemented in the simulation step in the
expl_boundary()
function.
Value
A dataframe with the p-values of obtaining the observed standard deviation of the euclidean distances of vertices in the upper peels to the center of the dataset for the left and right sections of the dataset.
Author(s)
Chawezi Miti <chawezi.miti@nottingham.ac.uk>
References
Eddy, W. F. (1982). Convex hull peeling, COMPSTAT 1982-Part I: Proceedings in Computational Statistics, 42-47. Physica-Verlag, Vienna.
Miti. c., Milne. A. E., Giller. K. E. and Lark. R. M (2024). Exploration of data for analysis using boundary line methodology. Computers and Electronics in Agriculture 219 (2024) 108794.
Examples
x<-evapotranspiration$`ET(mm)`
y<-evapotranspiration$`yield(t/ha)`
expl_boundary(x,y,10,100) # recommendation is to set simulations to greater than 1000