R: Maximum Likelihood Estimation for parametric linear...

maxlogLreg {EstimationTools}

R Documentation

Maximum Likelihood Estimation for parametric linear regression models

Description

Function to compute maximum likelihood estimators (MLE) of regression parameters of any distribution implemented in R with covariates (linear predictors).

Usage

maxlogLreg(
  formulas,
  y_dist,
  support = NULL,
  data = NULL,
  subset = NULL,
  fixed = NULL,
  link = NULL,
  start = NULL,
  lower = NULL,
  upper = NULL,
  optimizer = "nlminb",
  control = NULL,
  silent = FALSE,
  StdE_method = c("optim", "numDeriv"),
  ...
)

Arguments

`formulas`	a list of formula objects. Each element must have an `~`, with the terms on the right separated by `+` operators. The response variable on the left side is optional. Linear predictor of each parameter must be specified with the name of the parameter followed by the suffix `'.fo'`. See the examples below for further illustration.
`y_dist`	a formula object that specifies the distribution of the response variable. On the left side of `~` must be the response, and in the right side must be the name o de probability density/mass function. See the section Details and the examples below for further illustration.
`support`	a list with the following entries: `interval`: a two dimensional atomic vector indicating the set of possible values of a random variable having the distribution specified in `y_dist`. `type`: character indicating if distribution has a `discrete` or a `continous` random variable.
`data`	an optional data frame containing the variables in the model. If data is not specified, the variables are taken from the environment from which `maxlogLreg` is called.
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`fixed`	a list with fixed/known parameters of distribution of interest. Fixed parameters must be passed with its name and its value (known).
`link`	a list with names of parameters to be linked, and names of the link function object. For names of parameters, please visit documentation of density/mass function. There are three link functions available: `log_link`, `logit_link` and `NegInv_link`. Take into account: the order used in argument `over` corresponds to the order in argument `link`.
`start`	a numeric vector with initial values for the parameters to be estimated. Zero is the default value.
`lower`	a numeric vector with lower bounds, with the same lenght of argument `start` (for box-constrained optimization). `-Inf` is the default value.
`upper`	a numeric vector with upper bounds, with the same lenght of argument `start` (for box-constrained optimization). `Inf` is the default value.
`optimizer`	a length-one character vector with the name of optimization routine. `nlminb`, `optim` and `DEoptim` are available; `nlminb` is the default routine.
`control`	control parameters of the optimization routine. Please, visit documentation of selected optimizer for further information.
`silent`	logical. If TRUE, warnings of `maxlogL` are suppressed.
`StdE_method`	a length-one character vector with the routine for Hessian matrix computation. The This is needed for standard error estimation. The options available are `"optim"` and `"numDeriv"`. For further information, visit `optim` or `hessian`.
`...`	Further arguments to be supplied to the optimization routine.

Details

maxlogLreg computes programmatically the log-likelihood (log L) function corresponding for the following model:

y_i \stackrel{iid.}{\sim} \mathcal{D}(\theta_{i1},\theta_{i2},\dots, \theta_{ij}, \dots, \theta_{ik})

g(\boldsymbol{\theta}_{j}) = \boldsymbol{\eta}_{j} = \mathbf{X}_j^\top \boldsymbol{\beta}_j,

where,

g_k(\cdot) is the k-th link function.
\boldsymbol{\eta}_{j} is the value of the linear predictor for the $j^th$ for all the observations.
\boldsymbol{\beta}_j = (\beta_{0j}, \beta_{1j},\dots, \beta_{(p_j-1)j})^\top are the fixed effects vector, where p_j is the number of parameters in linear predictor j and \mathbf{X}_j is a known design matrix of order n\times p_j. These terms are specified in formulas argument.
\mathcal{D} is the distribution specified in argument y_dist.

Then, maxlogLreg maximizes the log L through optim, nlminb or DEoptim. maxlogLreg generates an S3 object of class maxlogL.

Estimation with censorship can be handled with Surv objects (see example 2). The output object stores the corresponding censorship matrix, defined as r_{il} = 1 if sample unit i has status l, or r_{il} = 0 in other case. i=1,2,\dots,n and l=1,2,3 (l=1: observation status, l=2: right censorship status, l=3: left censorship status).

Value

A list with class maxlogL containing the following lists:

fit

A list with output information about estimation and method used.

inputs

A list with all input arguments.

outputs

A list with additional information. The most important outputs are:

npar: number of parameters.
n: sample size
Stde_method: standard error computation method.
b_lenght: a list with the number of regression parameters.
design_matrix: a list with the \mathbf{X} matrix for each parameter, the response values (called y) and the censorship matrix (called status). See the Details section for further information.

Note

The following generic functions can be used with a maxlogL object: summary, print, logLik, AIC.
Noncentrality parameters must be named as ncp in the distribution.

Author(s)

Jaime Mosquera Gutiérrez, jmosquerag@unal.edu.co

References

Nelder JA, Mead R (1965). “A Simplex Method for Function Minimization.” The Computer Journal, 7(4), 308–313. ISSN 0010-4620, doi:10.1093/comjnl/7.4.308, https://academic.oup.com/comjnl/article-lookup/doi/10.1093/comjnl/7.4.308.

Fox PA, Hall AP, Schryer NL (1978). “The PORT Mathematical Subroutine Library.” ACM Transactions on Mathematical Software, 4(2), 104–126. ISSN 00983500, doi:10.1145/355780.355783, https://dl.acm.org/doi/10.1145/355780.355783.

Nash JC (1979). Compact Numerical Methods for Computers. Linear Algebra and Function Minimisation, 2nd Edition edition. Adam Hilger, Bristol.

Dennis JE, Gay DM, Walsh RE (1981). “An Adaptive Nonlinear Least-Squares Algorithm.” ACM Transactions on Mathematical Software, 7(3), 348–368. ISSN 00983500, doi:10.1145/355958.355965, https://dl.acm.org/doi/10.1145/355958.355965.

Examples

library(EstimationTools)

#--------------------------------------------------------------------------------
# Example 1: Estimation in simulated normal distribution
n <- 1000
x <- runif(n = n, -5, 6)
y <- rnorm(n = n, mean = -2 + 3 * x, sd = exp(1 + 0.3* x))
norm_data <- data.frame(y = y, x = x)

# It does not matter the order of distribution parameters
formulas <- list(sd.fo = ~ x, mean.fo = ~ x)
support <- list(interval = c(-Inf, Inf), type = 'continuous')

norm_mod <- maxlogLreg(formulas, y_dist = y ~ dnorm, support = support,
                       data = norm_data,
                       link = list(over = "sd", fun = "log_link"))
summary(norm_mod)


#--------------------------------------------------------------------------------
# Example 2: Fitting with censorship
# (data from https://www.itl.nist.gov/div898/handbook/apr/section4/apr413.htm)

failures <- c(55, 187, 216, 240, 244, 335, 361, 373, 375, 386)
fails <- c(failures, rep(500, 10))
status <- c(rep(1, length(failures)), rep(0, 10))
Wei_data <- data.frame(fails = fails, status = status)

# Formulas with linear predictors
formulas <- list(scale.fo=~1, shape.fo=~1)
support <- list(interval = c(0, Inf), type = 'continuous')

# Bounds for optimization. Upper bound set with default values (Inf)
start <- list(
  scale = list(Intercept = 100),
  shape = list(Intercept = 10)
)
lower <- list(
  scale = list(Intercept = 0),
  shape = list(Intercept = 0)
)

mod_weibull <- maxlogLreg(formulas, y_dist = Surv(fails, status) ~ dweibull,
                          support = c(0, Inf), start = start,
                          lower = lower, data = Wei_data)
summary(mod_weibull)


#--------------------------------------------------------------------------------

[Package EstimationTools version 4.0.0 Index]