DE.FEM {fdaPDE} | R Documentation |
Nonparametric density estimation with differential regularization
Description
This function implements a nonparametric density estimation method with differential regularization (given by the square root of the L2 norm of the laplacian of the density function), when points are located over a planar mesh. The computation relies only on the C++ implementation of the algorithm.
Usage
DE.FEM(data, FEMbasis, lambda, scaling=NULL, fvec=NULL, heatStep=0.1, heatIter=500,
stepProposals=NULL,tol1=1e-4, tol2=0, print=FALSE, nfolds=NULL,
nsimulations=500, step_method="Fixed_Step", direction_method="BFGS",
preprocess_method="NoCrossValidation", search = "tree", inference = FALSE)
Arguments
data |
A matrix of dimensions #observations-by-ndim. Data are locations: each row corresponds to one point,
the first column corresponds to the |
FEMbasis |
A |
lambda |
A scalar or vector of smoothing parameters. If it is a vector, the optimal smoothing parameter is chosen
with a |
scaling |
A positive factor needed to scale the smoothing parameter in the construction of confidence intervals. If the scaling is not specified, it is automatically set as the square root of the number of observations. |
fvec |
A vector of length # |
heatStep |
A real specifying the time step for the discretized heat diffusion process. Default is |
heatIter |
An integer specifying the number of iterations to perform the discretized heat diffusion process. Default is |
stepProposals |
A scalar or a vector containing the step parameters useful for the descent algorithm. If there is a
vector of parameters, the biggest one such that the functional decreases at each iteration is chosen. If it is |
tol1 |
A scalar specifying the tolerance to use for the termination criterion based on the percentage difference between two consecutive iterations of the minimization algorithm of the loss function, the log-likelihood and the penalization. Default is 1e-5. |
tol2 |
A scalar specifying the tolerance to use for the termination criterion based on the norm of the gradient of the functional to be minimized (the true minimum is such that this norm is zero). The default does not use this criterion. Default is 0. |
print |
A boolean that is |
nfolds |
An integer specifying the number of folds used in cross validation technique to find the best |
nsimulations |
An integer specifying the number of iterations used in the optimization algorithms. Default value is 500. |
step_method |
String. This parameter specifies which step method use in the descent algorithm.
If it is |
direction_method |
String. This parameter specifies which descent direction use in the descent algorithm.
If it is |
preprocess_method |
String. This parameter specifies the k fold cross validation technique to use, if there is more
than one smoothing parameter |
search |
a flag to decide the search algorithm type (tree or naive or walking search algorithm). Default is |
inference |
A boolean that is |
Value
A list with the following variables:
FEMbasis |
Given FEMbasis with tree information. |
g |
A vector of length # |
f_init |
A # |
lambda |
A scalar representing the optimal smoothing parameter selected via k fold cross validation, if in the input there is a vector of parameters; the scalar given in input otherwise. |
data |
A matrix of dimensions #observations-by-ndim containing the data used in the algorithm. They are the same given in input if the domain is 2D pr 3D; they are the original data projected on the mesh if the domain is 2.5D. |
CV_err |
A vector of length |
References
Ferraccioli, F., Arnone, E., Finos, L., Ramsay, J. O., Sangalli, L. M. (2021). Nonparametric density estimation over complicated domains. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(2), 346-368.
Arnone, E., Ferraccioli, F., Pigolotti, C., Sangalli, L.M. (2021), A roughness penalty approach to estimate densities over two-dimensional manifolds, Computational Statistics and Data Analysis, to appear.
Examples
library(fdaPDE)
## Create a 2D mesh over a squared domain
Xbound <- seq(-3, 3, length.out = 10)
Ybound <- seq(-3, 3, length.out = 10)
grid_XY <- expand.grid(Xbound, Ybound)
Bounds <- grid_XY[(grid_XY$Var1 %in% c(-3, 3)) | (grid_XY$Var2 %in% c(-3, 3)), ]
mesh <- create.mesh.2D(nodes = Bounds, order = 1)
mesh <- refine.mesh.2D(mesh, maximum_area = 0.2)
FEMbasis <- create.FEM.basis(mesh)
## Generate data
n <- 50
set.seed(10)
data_x <- rnorm(n)
data_y <- rnorm(n)
data <- cbind(data_x, data_y)
plot(mesh)
points(data, col="red", pch=19, cex=0.5)
## Density Estimation
lambda = 0.1
sol <- DE.FEM(data = data, FEMbasis = FEMbasis, lambda = lambda, fvec=NULL, heatStep=0.1,
heatIter=500, nsimulations=300,step_method = "Fixed_Step", inference = TRUE)
## Visualization
n = 100
X <- seq(-3, 3, length.out = n)
Y<- seq(-3, 3, length.out = n)
grid <- expand.grid(X, Y)
evaluation <- eval.FEM(FEM(FEMbasis, coeff = sol$g), locations = grid)
lower_bound_g <- eval.FEM(FEM(FEMbasis, coeff = sol$g_CI_L), locations = grid)
upper_bound_g <- eval.FEM(FEM(FEMbasis, coeff = sol$g_CI_U), locations = grid)
evaluation <- exp(evaluation)
lower_bound_g <- exp(lower_bound_g)
upper_bound_g <- exp(upper_bound_g)
eval <- matrix(evaluation, n, n)
eval_L <- matrix(lower_bound_g, n, n)
eval_U <- matrix(upper_bound_g, n, n)
image2D(x = X, y = Y, z = eval_L, col = heat.colors(100), xlab = "x", ylab = "y",
contour = list(drawlabels = FALSE), main = "Estimated CI lower bound")
image2D(x = X, y = Y, z = eval, col = heat.colors(100), xlab = "x", ylab = "y",
contour = list(drawlabels = FALSE), main = "Estimated density")
image2D(x = X, y = Y, z = eval_U, col = heat.colors(100), xlab = "x", ylab = "y",
contour = list(drawlabels = FALSE), main = "Estimated CI upper bound")