R: Fitter function for Weighted Average Least Squares estimation...

walsNBfit {WALS}

R Documentation

Fitter function for Weighted Average Least Squares estimation of NB2 regression model

Description

Workhorse function behind walsNB and used internally in walsNBfitIterate.

Usage

walsNBfit(
  X1,
  X2,
  y,
  betaStart1,
  betaStart2,
  rhoStart,
  family,
  prior,
  method = c("fullSVD", "original"),
  svdTol = .Machine$double.eps,
  svdRtol = 1e-06,
  keepUn = FALSE,
  keepR = FALSE,
  eigenSVD = TRUE,
  postmult = TRUE,
  ...
)

Arguments

`X1`	Design matrix for focus regressors. Usually includes a constant (column full of 1s) and can be generated using `model.matrix`.
`X2`	Design matrix for auxiliary regressors. Usually does not include a constant column and can also be generated using `model.matrix`.
`y`	Count response as vector.
`betaStart1`	Starting values for coefficients of focus regressors X1.
`betaStart2`	Starting values for coefficients of auxiliary regressors X2.
`rhoStart`	Starting value for log-dispersion parameter of NB2
`family`	Object of class `"familyNBWALS"`. Currently only supports `negbinWALS`.
`prior`	Object of class `"familyPrior"`. For example `weibull` or `laplace`.
`method`	Specifies method used. Available methods are `"fullSVD"` (default) or `"original"`. See details.
`svdTol`	Tolerance for rank of matrix `\bar{Z}_{1}` and `\bar{Z}`. Only used if `method = "fullSVD"`. Checks if smallest eigenvalue in SVD of `\bar{Z}_1` and `\bar{Z}` is larger than `svdTol`, otherwise reports a rank deficiency.
`svdRtol`	Relative tolerance for rank of matrix `\bar{Z}_{1}` and `\bar{Z}`. Only used if `method = "fullSVD"`. Checks if ratio of largest to smallest eigenvalue in SVD of `\bar{Z}_1` and `\bar{Z}` is larger than `svdRtol`, otherwise reports a rank deficiency.
`keepUn`	If `TRUE`, keeps the one-step ML estimators of the unrestricted model, i.e. `\tilde{\gamma}_{u}` and `\tilde{\beta}_{u}`.
`keepR`	If `TRUE`, keeps the one-step ML estimators of the fully restricted model, i.e. `\tilde{\gamma}_{r}` and `\tilde{\beta}_{r}`.
`eigenSVD`	If `TRUE`, then `semiorthogonalize()` uses `svd()` to compute the eigendecomposition of `\bar{\Xi}` instead of `eigen()`. In this case, the tolerances of `svdTol` and `svdRtol` are used to determine whether `\bar{\Xi}` is of full rank (need it for `\bar{\Xi}^{-1/2}`).
`postmult`	If `TRUE` (default), then it computes `\bar{Z}_{2} = \bar{X}_{2} \bar{\Delta}_{2} \bar{T} \bar{\Lambda}^{-1/2} \bar{T}^{\top},` where `\bar{T}` contains the eigenvectors and `\bar{\Lambda}` the eigenvalues from the eigenvalue decomposition `\bar{\Xi} = \bar{T} \bar{\Lambda} \bar{T}^{\top},` instead of `\bar{Z}_{2} = \bar{X}_{2} \bar{\Delta}_{2} \bar{T} \bar{\Lambda}^{-1/2}.` See Huynh (2024b) for more details. The latter is used in the original MATLAB code for WALS in the linear regression model, see eq. (12) of Magnus and De Luca (2016). The first form is required in eq. (9) of De Luca et al. (2018). Thus, it is not recommended to set `postmult = FALSE`.
`...`	Arguments for internal function `computePosterior`.

Details

The method to be specified in method mainly differ in the way they compute the fully restricted and unrestricted estimators for the transformed regressors Z, i.e. \tilde{\gamma}_{1r}, and \tilde{\gamma}_{u}.

"fullSVD"

Recommended approach. First applies an SVD to \bar{Z}_{1} to compute \bar{X}_{2}^{\top} \bar{M}_{1} \bar{X}_{2}: It is used for computing the inverse of

\bar{X}_{1}^{\top}\bar{X}_{1} + \bar{g} \bar{\epsilon} X_{1}^{\top}\bar{q} \bar{q}^{\top} X_{1},

when using the Sherman-Morrison-Woodbury formula. We further leverage the SVD of \bar{Z}_1 and additionally \bar{Z} to compute the unrestricted estimator \tilde{\gamma}_{u} and the fully restricted estimator \tilde{\gamma}_{r}. For \tilde{\gamma}_{u}, we simply use the SVD of \bar{Z} to solve the full equation system derived from the one-step ML problem for more details. The SVD of \bar{Z}_1 is further used in computing the model averaged estimator for the focus regressors \hat{\gamma}_1.

Described in more detail in the appendix of Huynh (2024b).

"original"

Computes all inverses directly using solve and does not make use of the Sherman-Morrison-Woodbury formula for certain inverses. Specifically, it directly inverts the matrix \bar{Z}_{1}^{\top} \bar{Z}_{1} using solve in order to compute \bar{M}_1. Moreover, it computes the fully unrestricted estimators of the focus regressors \tilde{\gamma}_{1u} and of the auxiliary regressors \tilde{\gamma}_{2u} and the fully restricted estimator \tilde{\gamma}_{1r} by directly implementing the formulas derived in Huynh (2024a). This method should only be used as reference and for easier debugging.

All variables in the code that contain "start" in their name are computed using the starting values of the one-step ML estimators. See section "One-step ML estimator" of (Huynh 2024a) for details.

Value

A list containing

`coef`	Model averaged estimates of all coefficients.
`beta1`	Model averaged estimates of the coefficients of the focus regressors.
`beta2`	Model averaged estimates of the coefficients of the auxiliary regressors.
`rho`	Model averaged estimate of the log-dispersion parameter of the NB2 distribution.
`gamma1`	Model averaged estimates of the coefficients of the transformed focus regressors.
`gamma2`	Model averaged estimates of the coefficients of the transformed auxiliary regressors.
`condition`	Condition number of the matrix `\bar{\Xi} = \bar{\Delta}_{2} \bar{X}_{2}^{\top} \bar{M}_{1} \bar{X}_{2} \bar{\Delta}_{2}`.
`vcovBeta`	`NULL`, not implemented yet, placeholder for estimated covariance matrix of the regression coefficients.
`vcovGamma`	`NULL`, not implemented yet, placeholder for estimated covariance matrix of the coefficients of the transformed regressors.
`betaStart`	Starting values of the regression coefficients for the one-step ML estimators.
`rhoStart`	Starting values of the dispersion parameter for the one-step ML estimators.
`method`	Stores `method` used from the arguments.
`prior`	`familyPrior`. The `prior` specified in the arguments.
`betaUn1`	If `keepUn = TRUE`, contains the unrestricted one-step ML estimators of the coefficients of the focus regressors. Else `NULL`.
`betaUn2`	If `keepUn = TRUE`, contains the unrestricted one-step ML estimators of the coefficients of the auxiliary regressors. Else `NULL`.
`gammaUn1`	If `keepUn = TRUE`, contains the unrestricted one-step ML estimators of the coefficients of the transformed focus regressors. Else `NULL`.
`gammaUn2`	If `keepUn = TRUE`, contains the unrestricted one-step ML estimators of the coefficients of the transformed auxiliary regressors. Else `NULL`.
`gamma1r`	If `keepR = TRUE`, contains the fully restricted one-step ML estimator for the transformed regressors (only focus regressors). Else `NULL`.
`k1`	Number of focus regressors.
`k2`	Number of auxiliary regressors.
`n`	Number of observations.
`X1names`	Names of the focus regressors.
`X2names`	Names of the auxiliary regressors.
`familyStart`	The family object of class `"familyNBWALS"` used for the estimation of the starting values.
`family`	The family object of class `"familyNBWALS"` used later for predictions.
`fitted.link`	Linear link fitted to the data.
`fitted.values`	Estimated conditional mean for the data. Lives on the scale of the response.

References

De Luca G, Magnus JR, Peracchi F (2018). “Weighted-average least squares estimation of generalized linear models.” Journal of Econometrics, 204(1), 1–17. doi:10.1016/j.jeconom.2017.12.007.

Huynh K (2024a). “Weighted-Average Least Squares for Negative Binomial Regression.” arXiv 2404.11324, arXiv.org E-Print Archive. doi:10.48550/arXiv.2404.11324.

Huynh K (2024b). “WALS: Weighted-Average Least Squares Model Averaging in R.” University of Basel. Mimeo.

Magnus JR, De Luca G (2016). “Weighted-average least squares (WALS): A survey.” Journal of Economic Surveys, 30(1), 117-148. doi:10.1111/joes.12094.

Examples

data("NMES1988", package = "AER")
NMES1988 <- na.omit(NMES1988)
form <- (visits ~ health + chronic + age + insurance + adl + region + gender
         + married + income + school + employed)
X <- model.matrix(form, data = NMES1988)
focus <- c("(Intercept)", "healthpoor", "healthexcellent", "chronic", "age",
        "insuranceyes")
aux <- c("adllimited", "regionnortheast", "regionmidwest", "regionwest",
         "gendermale", "marriedyes", "income", "school", "employedyes")
X1 <- X[, focus]
X2 <- X[, aux]
y <- NMES1988$visits

# starting values from glm.nb() from MASS
startFit <- MASS::glm.nb(y ~ X[,-1])
betaStart <- coef(startFit)
rhoStart <- startFit$theta
k1 <- ncol(X1)
k2 <- ncol(X2)

str(walsNBfit(X1, X2, y, rhoStart, family = negbinWALS(scale = rhoStart, link = "log"),
              betaStart1 = betaStart[1:k1],
              betaStart2 = betaStart[(k1 + 1):(k1 + k2)],
              prior = weibull(), method = "fullSVD"))

[Package WALS version 0.2.5 Index]