CompoundAuxFit {GB2} | R Documentation |
Fitting the Compound Distribution based on the GB2 by the Method of Pseudo Maximum Likelihood Estimation using Auxiliary Information
Description
Calculates the log-likelihood, the score functions of the log-likelihood and fits the compound distribution based on the GB2 and using auxiliary information.
Usage
pkl.cavgb2(z, lambda)
lambda0.cavgb2(pl0, z, w=rep(1, dim(z)[1]))
logl.cavgb2(fac, z, lambda, w=rep(1, dim(fac)[1]))
scores.cavgb2(fac, z, lambda, w=rep(1, dim(fac)[1]))
ml.cavgb2(fac, z, lambda0, w = rep(1, dim(fac)[1]), maxiter = 100, fnscale=length(w))
Arguments
z |
numeric; a matrix of auxiliary variables. |
lambda |
numeric; a matrix of parameters. |
pl0 |
numeric; a vector of initial proportions defining the number of components and the weight of each component density in the decomposition. Sums to one. |
w |
numeric; vector of weights of length the number of rows of the matrix |
fac |
numeric; a matrix of Gamma factors. |
lambda0 |
numeric; a matrix of initial parameters. |
maxiter |
numeric; maximum number of iterations to perform. By default |
fnscale |
numeric; parameter of the |
Details
We model the probabilities p_\ell
with auxiliary variables. Let z_k
denote the vector of auxiliary information for unit k
. This auxiliary information modifies the probabilities p_\ell
at the unit level. Denote by p_{k,\ell}
the weight of the density f_\ell
for unit k
. For \ell=1,...,L-1
, we pose a linear model for the log-ratio v_{k,\ell}
:
\log(p_{k,\ell}/p_{k,L}) = v_{k,\ell} =\sum_{i=1}^I \lambda_{\ell i} z_{k i}= \mathbf{z}_k^{T} \boldsymbol{\lambda_{\ell}}.
Function pkl.cavgb2
calculates the p_{k,\ell}
. Function lambda0.cavgb2
calculates the initial values \lambda_{\ell i}
, i= 1, ..., I
, \ell = 1, ..., L-1
. Let
\bar{z}_{i}=\sum_k w_k z_{ki}/\sum_k w_k
be the mean value of the i
-th explanatory variable.
Writing
\log(\hat{p}^{(0)}_\ell / \hat{p}^{(0)}_L)=v^{(0)}_\ell = \sum_{i=1}^I \lambda^{(0)}_{\ell i} \bar{z}_{i},
we can choose \lambda^{(0)}_{\ell i}= v^{(0)}_\ell / (I \bar{z}_{i}).
Analogically to the ordinary fit of the compound distribution based on the GB2 CompoundFit
, we express the log-likelihood as a weighted mean of log(f) = log(\sum(p_{k,\ell} f_\ell(x_k))
, evaluated at the data points, where f
is the GB2 compound density.
The scores are obtained as the weighted sums of the first derivatives of the log-likelihood, with respect to the parameters \lambda_\ell, \ \ell=1, ..., L-1
, evaluated at the data points.
Function ml.cavgb2
performs maximum likelihood estimation through the general-purpose optimization function optim
from package stats
.
The considered method of optimization is "BFGS" (optim
). Once we have the fitted parameters \hat{\lambda}
we can deduce the fitted parameters \hat{v{k\ell}}
and
\hat{p_{k\ell}}
in function of \bar{z}
and \hat{\lambda}_{\ell}
.
Value
pkl.cavgb2
returns a matrix of probabilities. lambda0.cavgb2
returns a matrix of size I \times L-1
.
logl.cavgb2
returns the value of the pseudo log-likelihood.
scores.cavgb2
returns the weighted sum of the scores of the log-likelihood.
ml.cavgb2
returns a list containing two objects - the vector of fitted coefficients \hat{\lambda_\ell}
and the output of the "BFGS" fit.
Author(s)
Monique Graf and Desislava Nedyalkova
See Also
Examples
## Not run:
library(simFrame)
data(eusilcP)
# Stratified cluster sampling
set.seed(12345)
srss <- SampleControl(design = "region", grouping = "hid", size = c(200*3, 1095*3, 1390*3,
425*3, 820*3, 990*3, 400*3, 450*3, 230*3), k = 1)
# Draw a sample
s1 <- draw(eusilcP,srss)
#names(s1)
# Creation of auxiliary variables
ind <- order(s1[["hid"]])
ss1 <- data.frame(hid=s1[["hid"]], region=s1[["region"]], hsize=s1[["hsize"]],
peqInc=s1[["eqIncome"]], age=s1[["age"]], pw=s1[[".weight"]])[ind,]
ss1[["child"]] <- as.numeric((ss1[["age"]]<=14))
ss1[["adult"]] <- as.numeric((ss1[["age"]]>=20))
sa <- aggregate(ss1[,c("child","adult")],list(ss1[["hid"]]),sum)
names(sa)[1] <- "hid"
sa[["children"]] <- as.numeric((sa[["child"]]>0))
sa[["single_a"]] <- as.numeric((sa[["adult"]]==1))
sa[["sa.ch"]] <- sa[["single_a"]]*sa[["children"]]
sa[["ma.ch"]] <- (1-sa[["single_a"]])*sa[["children"]]
sa[["nochild"]] <- 1-sa[["children"]]
# New data set
ns <- merge(ss1[,c("hid","region","hsize","peqInc","pw")],
sa[,c("hid","nochild","sa.ch","ma.ch")], by="hid")
# Ordering the data set
ns <- ns[!is.na(ns$peqInc),]
index <- order(ns$peqInc)
ns <- ns[index,]
# Truncate at 0
ns <- ns[ns$peqInc>0,]
# income
peqInc <- ns$peqInc
# weights
pw <- ns$pw
# Adding the weight adjustment
c1 <- 0.1
pwa <- robwts(peqInc,pw,c1,0.001)[[1]]
corr <- mean(pw)/mean(pwa)
pwa <- pwa*corr
ns <- data.frame(ns, aw=pwa)
# Empirical indicators with original weights
emp.ind <- c(main.emp(peqInc, pw),
main.emp(peqInc[ns[["nochild"]]==1], pw[ns[["nochild"]]==1]),
main.emp(peqInc[ns[["sa.ch"]]==1], pw[ns[["sa.ch"]]==1]),
main.emp(peqInc[ns[["ma.ch"]]==1], pw[ns[["ma.ch"]]==1]))
# Matrix of auxiliary variables
z <- ns[,c("nochild","sa.ch","ma.ch")]
#unique(z)
z <- as.matrix(z)
# global GB2 fit, ML profile log-likelihood
gl.fit <- profml.gb2(peqInc,pwa)$opt1
agl.fit <- gl.fit$par[1]
bgl.fit <- gl.fit$par[2]
pgl.fit <- prof.gb2(peqInc,agl.fit,bgl.fit,pwa)[3]
qgl.fit <- prof.gb2(peqInc,agl.fit,bgl.fit,pwa)[4]
# Likelihood and convergence
proflikgl <- -gl.fit$value
convgl <- gl.fit$convergence
# Fitted GB2 parameters and indicators
profgb2.par <- c(agl.fit, bgl.fit, pgl.fit, qgl.fit)
profgb2.ind <- main.gb2(0.6, agl.fit, bgl.fit, pgl.fit, qgl.fit)
# Initial lambda and pl
pl0 <- c(0.2,0.6,0.2)
lambda0 <- lambda0.cavgb2(pl0, z, pwa)
# left decomposition
decomp <- "l"
facgl <- fg.cgb2(peqInc, agl.fit, bgl.fit, pgl.fit, qgl.fit, pl0 ,decomp)
fitcml <- ml.cavgb2(facgl, z, lambda0, pwa, maxiter=500)
fitcml
convcl <- fitcml[[2]]$convergence
convcl
lambdafitl <- fitcml[[1]]
pglfitl <- pkl.cavgb2(diag(rep(1,3),lambdafitl)
row.names(pglfitl) <- colnames(z)
## End(Not run)