R: Censored linear regression model with autoregressive errors

ARCensReg {ARCensReg}

R Documentation

Censored linear regression model with autoregressive errors

Description

It fits a univariate left, right, or interval censored linear regression model with autoregressive errors under the normal distribution, using the SAEM algorithm. It provides estimates and standard errors of the parameters, supporting missing values on the dependent variable.

Usage

ARCensReg(cc, lcl = NULL, ucl = NULL, y, x, p = 1, M = 10, 
  perc = 0.25, MaxIter = 400, pc = 0.18, tol = 1e-04, 
  show_se = TRUE, quiet = FALSE)

Arguments

`cc`	Vector of censoring indicators of length `n`, where `n` is the total observations. For each observation: 0 if non-censored, 1 if censored/missing.
`lcl`, `ucl`	Vectors of length `n` that represent the lower and upper bounds of the interval, which contains the observade value of the censored observation. Default=`NULL`, indicating no-censored data. See details for more information.
`y`	Vector of responses of length `n`.
`x`	Matrix of covariates of dimension `n \times l`, where `l` is the number of fixed effects including the intercept, if considered (in models which include an intercept, `x` should contain a column of ones).
`p`	Order of the autoregressive process. It must be a positive integer value.
`M`	Size of the Monte Carlo sample generated in each step of the SAEM algorithm. Default=10.
`perc`	Percentage of burn-in on the Monte Carlo sample. Default=0.25.
`MaxIter`	The maximum number of iterations of the SAEM algorithm. Default=400.
`pc`	Percentage of initial iterations of the SAEM algorithm with no memory. It is recommended that `50<MaxIter*pc<100`. Default=0.18.
`tol`	The convergence maximum error permitted.
`show_se`	`TRUE` or `FALSE`. Indicates if the standard errors should be estimated. Default=`TRUE`.
`quiet`	`TRUE` or `FALSE`. Indicates if printing information should be suppressed. Default=`FALSE`.

Details

The linear regression model with autocorrelated errors, defined as a discrete-time autoregressive (AR) process of order p, at time t is given by

Y_t = x_t^T \beta + \xi_t,

\xi_t = \phi_1 \xi_{t-1} + ... + \phi_p \xi_{t-p} + \eta_t, t=1, ..., n,

where Y_t is the response variable, \beta = (\beta_1, ..., \beta_l)^T is a vector of regression parameters of dimension l, and x_t = (x_{t1}, ..., x_{tl})^T is a vector of non-stochastic regressor variables values; \xi_t is the AR error with Gaussian disturbance \eta_t, \phi = (\phi_1, ..., \phi_p)^T is the vector of AR coefficients, and n is the sample size.

It is assumed that Y_t is not fully observed for all t. For left censored observations, we have lcl=-Inf and ucl=V_t, such that the true value Y_t \leq V_t. For right censoring, lcl=V_t and ucl=Inf, such that Y_t \geq V_t. For interval censoring, lcl and ucl must be finite values, such that V_{1t} \leq Y_t \leq V_{2t}. Missing data can be defined by setting lcl=-Inf and ucl=Inf.

The initial values are obtained by ignoring censoring and applying maximum likelihood estimation with the censored data replaced by their censoring limits. Furthermore, just set cc as a vector of zeros to fit a regression model with autoregressive errors for non-censored data.

Value

An object of class "ARpCRM", representing the AR(p) censored regression normal fit. Generic functions such as print and summary have methods to show the results of the fit. The function plot provides convergence graphics for the parameters when at least one censored observation exists.

Specifically, the following components are returned:

`beta`	Estimate of the regression parameters.
`sigma2`	Estimated variance of the white noise process.
`phi`	Estimate of the autoregressive parameters.
`pi1`	Estimate of the first `p` partial autocorrelations.
`theta`	Vector of parameters estimate (`\beta, \sigma^2, \phi`).
`SE`	Vector of the standard errors of (`\beta, \sigma^2, \phi`).
`loglik`	Log-likelihood value.
`AIC`	Akaike information criterion.
`BIC`	Bayesian information criterion.
`AICcorr`	Corrected Akaike information criterion.
`yest`	Augmented response variable based on the fitted model.
`yyest`	Final estimative of `E(Y%*%t(Y))`.
`x`	Matrix of covariates of dimension `n \times l`.
`iter`	Number of iterations until convergence.
`criteria`	Attained criteria value.
`call`	The `ARCensReg` call that produced the object.
`tab`	Table of estimates.
`critFin`	Selection criteria.
`cens`	"left", "right", or "interval" for left, right, or interval censoring, respectively.
`nmiss`	Number of missing observations.
`ncens`	Number of censored observations.
`converge`	Logical indicating convergence of the estimation algorithm.
`MaxIter`	The maximum number of iterations used for the SAEM algorithm.
`M`	Size of the Monte Carlo sample generated in each step of the SAEM algorithm.
`pc`	Percentage of initial iterations of the SAEM algorithm with no memory.
`time`	Time elapsed in processing.
`plot`	A list containing convergence information.

Author(s)

Fernanda L. Schumacher, Katherine L. Valeriano, Victor H. Lachos, Christian E. Galarza, and Larissa A. Matos

References

Delyon B, Lavielle M, Moulines E (1999). “Convergence of a stochastic approximation version of the EM algorithm.” Annals of statistics, 94–128.

Schumacher FL, Lachos VH, Dey DK (2017). “Censored regression models with autoregressive errors: A likelihood-based perspective.” Canadian Journal of Statistics, 45(4), 375–392.

Examples

## Example 1: (p = l = 1)
# Generating a sample
set.seed(23451)
n = 50
x = rep(1, n)
dat = rARCens(n=n, beta=2, phi=.5, sig2=.3, x=x, cens='left', pcens=.1)

# Fitting the model (quick convergence)
fit0 = ARCensReg(dat$data$cc, dat$data$lcl, dat$data$ucl, dat$data$y, x,
                 M=5, pc=.12, tol=0.001, show_se=FALSE)
fit0

## Example 2: (p = l = 2)
# Generating a sample
n = 100
x = cbind(1, runif(n))
dat = rARCens(n=n, beta=c(2,1), phi=c(.48,-.2), sig2=.5, x=x, cens='left', 
              pcens=.05)

# Fitting the model
fit1 = ARCensReg(dat$data$cc, dat$data$lcl, dat$data$ucl, dat$data$y, x,
                 p=2, tol=0.0001)
summary(fit1)
plot(fit1)

# Plotting the augmented variable
library(ggplot2)
data.plot = data.frame(yobs=dat$data$y, yest=fit1$yest)
ggplot(data.plot) + theme_bw() +
  geom_line(aes(x=1:nrow(data.plot), y=yest), color=4, linetype="dashed") +
  geom_line(aes(x=1:nrow(data.plot), y=yobs)) + labs(x="Time", y="y")

## Example 3: Simulating missing values
miss = sample(1:n, 3)
yMISS = dat$data$y
yMISS[miss] = NA
cc = dat$data$cc
cc[miss] = 1
lcl = dat$data$lcl
ucl = dat$data$ucl
ucl[miss] = Inf

fit2 = ARCensReg(cc, lcl, ucl, yMISS, x, p=2)
plot(fit2)

# Imputed missing values
data.frame(yobs=dat$data$y[miss], yest=fit2$yest[miss])

[Package ARCensReg version 3.0.1 Index]