ei_gme {EIEntropy} | R Documentation |
Ecologic Inference applying entropy
Description
The function ei_gme defines the Shannon entropy function which takes a vector of probabilities as input and returns the negative
sum of p times the natural logarithm of p.The function will set the optimization parameters and using the "optim" function an optimal
solution is obtained.
The function defines the independent variables in the two databases needed, which we call datahp with "n_hp" observations and datahs
with "n_hs" observations; and the function of the binary variable of interest y. Then the weights of each observation for the two
databases used are defined, if there are no weights available it will be 1.
The errors are calculated pondering the support vector of dimension var, 0, -var
. This support vector can be specified by the user.
The default support vector is based on variance.We recommend a wider interval with v(-1,0,1) as the maximum.
The restrictions are defined to guarantee consistency.
The optimization of the Shannon entropy function is solved with the "optim" function local solver "BFGS" and the tolerance by default is settled in
1e-24 but can be specified by the user.The model used in the optimization can be specified too between: "Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN",
"Brent". The method by default and the recommended is BFGS
Usage
ei_gme(fn, datahp, datahs, w, tol, method, v = NULL)
Arguments
fn |
is the formula that represents the dependent variable in the optimization. In the context of this function, 'fn' is used to define the dependent variable to be optimized by the entropy function. |
datahp |
The data where the variable of interest y is available and also the independent variables. Note: The variables and weights used as independent variables must have the same name in 'datahp' and in 'datahs' The variables in both databases need to match up in content. |
datahs |
The data with the information of the independent variables as a disaggregated level. Note: The variables and weights used as independent variables must be the same and must have the same name in 'datahp' and in 'datahs' |
w |
The weights to be used in this function. |
tol |
The tolerance to be applied in the optimization function. If the tolerance is not specified, the default tolerance has been set in 1e-24 |
method |
The method used in the function optim.This can be selected by the user between: "Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN", "Brent". The method by default and the recommended is BFGS |
v |
The support vector |
Details
To solve the optimization upper and lower bounds for p and w are settled, specifically, p and w must be above 0 and lower than 1. In addition, the initial values of p are settled as a uniform distribution and the errors (w) as 1/L.
Value
The function will provide you a dataframe called table with the next information:
-
probabilities Probabilities for each individual to each possibility j of the variable of interest y.
-
error primal Errors calculated to the j possibilities of y.
-
predictions The prediction for each individual is calculated as the sum of the probability plus the error primal. The function provides information about the optimization process as :
-
value_of_entropy The value of entropy resulting from the optimization.
-
iterations Indicates the times the objective function and the gradient has been evaluated during the optimization process
-
message Indicates the message if it has been generated in the process of optimization
-
tol Indicates the tolerance used in the optimization
-
method Indicates the method used in the optimization
-
v Indicates the vector of support used in the function The function provides a dataframe containing the information about lambda:
-
lambda The estimated lambda values. It is provided an object with the restrictions checked which should be approximately zero.
-
check restrictions Being g1 the restriction related to the unit probability constraint, g2 to the error unit sum constraint, and g3 to the consistency restriction that implies that the difference between the cross moment in both datasets must be zero.
The restriction g3 can be checked thoroughly with the objects by separate.
-
cross moments hp Cross moments in
datahp
. -
cross moments hs Cross moments in
datahs
.
References
Fernandez-Vazquez, E., Díaz-Dapena, A., Rubiera-Morollon, F., Viñuela, A., (2020) Spatial Disaggregation of Social Indicators: An Info-Metrics Approach. Social Indicators Research, 152(2), 809–821. https://doi.org/10.1007/s11205-020-02455-z.
Examples
#In this example we use the data of this package
datahp <- financial()
datahs <- social()
# Setting up our function for the dependent variable.
fn <- datahp$poor_liq ~ Dcollege+Totalincome+Dunemp
#Applying the function ei_gme to our databases. In this case datahp
#is the data where we have our variable of interest datahs is the data
# where we have the information for the disaggregation.
#w can be included if we have weights in both surveys
#Tolerance in this example is fixed in 1e-20 and v will be (-1,0,1)
v=matrix(c(-1, 0, 1), nrow = 1)
result <- ei_gme(fn=fn,datahp=datahp,datahs=datahs,w,tol=1e-20,method="BFGS",v=v)