walsNBfit {WALS}R Documentation

Fitter function for Weighted Average Least Squares estimation of NB2 regression model

Description

Workhorse function behind walsNB and used internally in walsNBfitIterate.

Usage

walsNBfit(
  X1,
  X2,
  y,
  betaStart1,
  betaStart2,
  rhoStart,
  family,
  prior,
  method = c("fullSVD", "original"),
  svdTol = .Machine$double.eps,
  svdRtol = 1e-06,
  keepUn = FALSE,
  keepR = FALSE,
  eigenSVD = TRUE,
  postmult = TRUE,
  ...
)

Arguments

X1

Design matrix for focus regressors. Usually includes a constant (column full of 1s) and can be generated using model.matrix.

X2

Design matrix for auxiliary regressors. Usually does not include a constant column and can also be generated using model.matrix.

y

Count response as vector.

betaStart1

Starting values for coefficients of focus regressors X1.

betaStart2

Starting values for coefficients of auxiliary regressors X2.

rhoStart

Starting value for log-dispersion parameter of NB2

family

Object of class "familyNBWALS". Currently only supports negbinWALS.

prior

Object of class "familyPrior". For example weibull or laplace.

method

Specifies method used. Available methods are "fullSVD" (default) or "original". See details.

svdTol

Tolerance for rank of matrix \bar{Z}_{1} and \bar{Z}. Only used if method = "fullSVD". Checks if smallest eigenvalue in SVD of \bar{Z}_1 and \bar{Z} is larger than svdTol, otherwise reports a rank deficiency.

svdRtol

Relative tolerance for rank of matrix \bar{Z}_{1} and \bar{Z}. Only used if method = "fullSVD". Checks if ratio of largest to smallest eigenvalue in SVD of \bar{Z}_1 and \bar{Z} is larger than svdRtol, otherwise reports a rank deficiency.

keepUn

If TRUE, keeps the one-step ML estimators of the unrestricted model, i.e. \tilde{\gamma}_{u} and \tilde{\beta}_{u}.

keepR

If TRUE, keeps the one-step ML estimators of the fully restricted model, i.e. \tilde{\gamma}_{r} and \tilde{\beta}_{r}.

eigenSVD

If TRUE, then semiorthogonalize() uses svd() to compute the eigendecomposition of \bar{\Xi} instead of eigen(). In this case, the tolerances of svdTol and svdRtol are used to determine whether \bar{\Xi} is of full rank (need it for \bar{\Xi}^{-1/2}).

postmult

If TRUE (default), then it computes

\bar{Z}_{2} = \bar{X}_{2} \bar{\Delta}_{2} \bar{T} \bar{\Lambda}^{-1/2} \bar{T}^{\top},

where \bar{T} contains the eigenvectors and \bar{\Lambda} the eigenvalues from the eigenvalue decomposition

\bar{\Xi} = \bar{T} \bar{\Lambda} \bar{T}^{\top},

instead of

\bar{Z}_{2} = \bar{X}_{2} \bar{\Delta}_{2} \bar{T} \bar{\Lambda}^{-1/2}.

See Huynh (2024b) for more details. The latter is used in the original MATLAB code for WALS in the linear regression model, see eq. (12) of Magnus and De Luca (2016). The first form is required in eq. (9) of De Luca et al. (2018). Thus, it is not recommended to set postmult = FALSE.

...

Arguments for internal function computePosterior.

Details

The method to be specified in method mainly differ in the way they compute the fully restricted and unrestricted estimators for the transformed regressors Z, i.e. \tilde{\gamma}_{1r}, and \tilde{\gamma}_{u}.

"fullSVD"

Recommended approach. First applies an SVD to \bar{Z}_{1} to compute \bar{X}_{2}^{\top} \bar{M}_{1} \bar{X}_{2}: It is used for computing the inverse of

\bar{X}_{1}^{\top}\bar{X}_{1} + \bar{g} \bar{\epsilon} X_{1}^{\top}\bar{q} \bar{q}^{\top} X_{1},

when using the Sherman-Morrison-Woodbury formula. We further leverage the SVD of \bar{Z}_1 and additionally \bar{Z} to compute the unrestricted estimator \tilde{\gamma}_{u} and the fully restricted estimator \tilde{\gamma}_{r}. For \tilde{\gamma}_{u}, we simply use the SVD of \bar{Z} to solve the full equation system derived from the one-step ML problem for more details. The SVD of \bar{Z}_1 is further used in computing the model averaged estimator for the focus regressors \hat{\gamma}_1.

Described in more detail in the appendix of Huynh (2024b).

"original"

Computes all inverses directly using solve and does not make use of the Sherman-Morrison-Woodbury formula for certain inverses. Specifically, it directly inverts the matrix \bar{Z}_{1}^{\top} \bar{Z}_{1} using solve in order to compute \bar{M}_1. Moreover, it computes the fully unrestricted estimators of the focus regressors \tilde{\gamma}_{1u} and of the auxiliary regressors \tilde{\gamma}_{2u} and the fully restricted estimator \tilde{\gamma}_{1r} by directly implementing the formulas derived in Huynh (2024a). This method should only be used as reference and for easier debugging.

All variables in the code that contain "start" in their name are computed using the starting values of the one-step ML estimators. See section "One-step ML estimator" of (Huynh 2024a) for details.

Value

A list containing

coef

Model averaged estimates of all coefficients.

beta1

Model averaged estimates of the coefficients of the focus regressors.

beta2

Model averaged estimates of the coefficients of the auxiliary regressors.

rho

Model averaged estimate of the log-dispersion parameter of the NB2 distribution.

gamma1

Model averaged estimates of the coefficients of the transformed focus regressors.

gamma2

Model averaged estimates of the coefficients of the transformed auxiliary regressors.

condition

Condition number of the matrix \bar{\Xi} = \bar{\Delta}_{2} \bar{X}_{2}^{\top} \bar{M}_{1} \bar{X}_{2} \bar{\Delta}_{2}.

vcovBeta

NULL, not implemented yet, placeholder for estimated covariance matrix of the regression coefficients.

vcovGamma

NULL, not implemented yet, placeholder for estimated covariance matrix of the coefficients of the transformed regressors.

betaStart

Starting values of the regression coefficients for the one-step ML estimators.

rhoStart

Starting values of the dispersion parameter for the one-step ML estimators.

method

Stores method used from the arguments.

prior

familyPrior. The prior specified in the arguments.

betaUn1

If keepUn = TRUE, contains the unrestricted one-step ML estimators of the coefficients of the focus regressors. Else NULL.

betaUn2

If keepUn = TRUE, contains the unrestricted one-step ML estimators of the coefficients of the auxiliary regressors. Else NULL.

gammaUn1

If keepUn = TRUE, contains the unrestricted one-step ML estimators of the coefficients of the transformed focus regressors. Else NULL.

gammaUn2

If keepUn = TRUE, contains the unrestricted one-step ML estimators of the coefficients of the transformed auxiliary regressors. Else NULL.

gamma1r

If keepR = TRUE, contains the fully restricted one-step ML estimator for the transformed regressors (only focus regressors). Else NULL.

k1

Number of focus regressors.

k2

Number of auxiliary regressors.

n

Number of observations.

X1names

Names of the focus regressors.

X2names

Names of the auxiliary regressors.

familyStart

The family object of class "familyNBWALS" used for the estimation of the starting values.

family

The family object of class "familyNBWALS" used later for predictions.

fitted.link

Linear link fitted to the data.

fitted.values

Estimated conditional mean for the data. Lives on the scale of the response.

References

De Luca G, Magnus JR, Peracchi F (2018). “Weighted-average least squares estimation of generalized linear models.” Journal of Econometrics, 204(1), 1–17. doi:10.1016/j.jeconom.2017.12.007.

Huynh K (2024a). “Weighted-Average Least Squares for Negative Binomial Regression.” arXiv 2404.11324, arXiv.org E-Print Archive. doi:10.48550/arXiv.2404.11324.

Huynh K (2024b). “WALS: Weighted-Average Least Squares Model Averaging in R.” University of Basel. Mimeo.

Magnus JR, De Luca G (2016). “Weighted-average least squares (WALS): A survey.” Journal of Economic Surveys, 30(1), 117-148. doi:10.1111/joes.12094.

See Also

walsNB, walsNBfitIterate.

Examples

data("NMES1988", package = "AER")
NMES1988 <- na.omit(NMES1988)
form <- (visits ~ health + chronic + age + insurance + adl + region + gender
         + married + income + school + employed)
X <- model.matrix(form, data = NMES1988)
focus <- c("(Intercept)", "healthpoor", "healthexcellent", "chronic", "age",
        "insuranceyes")
aux <- c("adllimited", "regionnortheast", "regionmidwest", "regionwest",
         "gendermale", "marriedyes", "income", "school", "employedyes")
X1 <- X[, focus]
X2 <- X[, aux]
y <- NMES1988$visits

# starting values from glm.nb() from MASS
startFit <- MASS::glm.nb(y ~ X[,-1])
betaStart <- coef(startFit)
rhoStart <- startFit$theta
k1 <- ncol(X1)
k2 <- ncol(X2)

str(walsNBfit(X1, X2, y, rhoStart, family = negbinWALS(scale = rhoStart, link = "log"),
              betaStart1 = betaStart[1:k1],
              betaStart2 = betaStart[(k1 + 1):(k1 + k2)],
              prior = weibull(), method = "fullSVD"))


[Package WALS version 0.2.5 Index]