R: Estimating linear regression models with (in)equality...

restriktor {restriktor}

R Documentation

Estimating linear regression models with (in)equality restrictions

Description

Function restriktor estimates the parameters of an univariate and a multivariate linear model (lm), a robust estimation of the linear model (rlm) and a generalized linear model (glm) subject to linear equality and linear inequality restrictions. It is a convenience function. The real work horses are the conLM, conMLM, conRLM and the conGLM functions.

Usage

restriktor(object, constraints = NULL, ...)

## S3 method for class 'lm'
conLM(object, constraints = NULL, se = "standard", 
      B = 999, rhs = NULL, neq = 0L, mix_weights = "pmvnorm", 
      parallel = "no", ncpus = 1L, cl = NULL, seed = NULL, 
      control = list(), verbose = FALSE, debug = FALSE, ...)
      
## S3 method for class 'rlm'
conRLM(object, constraints = NULL, se = "standard", 
       B = 999, rhs = NULL, neq = 0L, mix_weights = "pmvnorm", 
       parallel = "no", ncpus = 1L, cl = NULL, seed = NULL, 
       control = list(), verbose = FALSE, debug = FALSE, ...)
       
## S3 method for class 'glm'
conGLM(object, constraints = NULL, se = "standard", 
       B = 999, rhs = NULL, neq = 0L, mix_weights = "pmvnorm", 
       parallel = "no", ncpus = 1L, cl = NULL, seed = NULL, 
       control = list(), verbose = FALSE, debug = FALSE, ...)

## S3 method for class 'mlm'
conMLM(object, constraints = NULL, se = "none", 
       B = 999, rhs = NULL, neq = 0L, mix_weights = "pmvnorm", 
       parallel = "no", ncpus = 1L, cl = NULL, seed = NULL, 
       control = list(), verbose = FALSE, debug = FALSE, ...)

Arguments

`object`	a fitted linear model object of class "lm", "mlm", "rlm" or "glm". For class "rlm" only the loss function `bisquare` is supported for now, otherwise the function gives an error.
`constraints`	there are two ways to constrain parameters. First, the constraint syntax consists of one or more text-based descriptions, where the syntax can be specified as a literal string enclosed by single quotes. Only the names of `coef(model)` can be used as names. See details for more information. Note that objects of class "mlm" do not (yet) support this method. Second, the constraint syntax consists of a matrix `R` (or a vector in case of one constraint) and defines the left-hand side of the constraint `R\theta \ge rhs`, where each row represents one constraint. The number of columns needs to correspond to the number of parameters estimated (`\theta`) by model. The rows should be linear independent, otherwise the function gives an error. For more information about constructing the matrix `R` and `rhs` see details.
`se`	if "`standard`" (default), conventional standard errors are computed based on inverting the observed augmented information matrix. If "const", homoskedastic standard errors are computed. If "`HC0`" or just "`HC`", heteroskedastic robust standard errors are computed (a.k.a Huber White). The options "`HC1`", "`HC2`", "`HC3`", "`HC4`", "`HC4m`", and "`HC5`" are refinements of "`HC0`". For more details about heteroskedastic robust standard errors see the sandwich package. If "`boot.standard`", bootstrapped standard errors are computed using standard bootstrapping. If "`boot.model.based`" or "`boot.residual`", bootstrapped standard errors are computed using model-based bootstrapping. If "`none`", no standard errors are computed. Note that for objects of class "mlm" no standard errors are available (yet).
`B`	integer; number of bootstrap draws for `se`. The default value is set to 999. Parallel support is available.
`rhs`	vector on the right-hand side of the constraints; `R\theta \ge rhs`. The length of this vector equals the number of rows of the constraints matrix `R` and consists of zeros by default. Note: only used if constraints input is a matrix or vector.
`neq`	integer (default = 0) treating the number of constraints rows as equality constraints instead of inequality constraints. For example, if `neq = 2`, this means that the first two rows of the constraints matrix `R` are treated as equality constraints. Note: only used if constraints input is a matrix or vector.
`mix_weights`	if `"pmvnorm"` (default), the chi-bar-square weights are computed based on the multivariate normal distribution function with additional Monte Carlo steps. If `"boot"`, the chi-bar-square weights are computed using parametric bootstrapping. If `"none"`, no chi-bar-square weights are computed. The weights are necessary in the `restriktor.summary` function for computing the GORIC. Moreover, the weights are re-used in the `iht` function for computing the p-value for the test-statistic, unless the p-value is computed directly via bootstrapping.
`parallel`	the type of parallel operation to be used (if any). If missing, the default is set "no".
`ncpus`	integer: number of processes to be used in parallel operation: typically one would chose this to the number of available CPUs.
`cl`	an optional parallel or snow cluster for use if parallel = "snow". If not supplied, a cluster on the local machine is created for the duration of the `restriktor` call.
`seed`	seed value.
`control`	a list of control arguments: `absval` tolerance criterion for convergence (default = sqrt(.Machine$double.eps)). `maxit` the maximum number of iterations for the optimizer (default = 10000). `tol` numerical tolerance value. Estimates smaller than `tol` are set to 0. Control options for calculating the chi-bar-square weights: `chunk_size` integer; the chi-bar-square weights are computed for samples of size `chunk_size = 5000L`. This process is repeated iteratively until the weights converges (see `convergenge_crit`) or the maximum is reached, i.e., `mix_weights_bootstrap_limit`. `mix_weights_bootstrap_limit` integer; maximum number of bootstrap draws. The default value is set to 1e5. `convergence_crit` the convergence criterion for the iterative bootstrap process. Default is 1e-03.
`verbose`	logical; if TRUE, information is shown at each bootstrap draw.
`debug`	if TRUE, debugging information about the constraints is printed out.

Options for calculating the chi-bar-square weights:

...

parameters passed to the truncated multivariate normal distribution. By default, restriktor (i.e. con_weights_boot function) uses no truncation points for calculating the chi-bar-square weights, which renders to the multivariate normal distribution. See the manual page of the rtmvnorm function from the rtmvnorm to see how to specify a truncated mvnorm distribution and the possible arguments.

Details

The constraint syntax can be specified in two ways. First as a literal string enclosed by single quotes as shown below:

myConstraints <- '
    # 1. inequality constraints
      x1 > 0
      x1 < x2
    # or
      0 < x1 < x2
    
    ! 2. equality constraints  
      x3 == x4; x4 == x5 
    # or 
      x3 = x4; x4 = x5 
    # or
      x3 = x4 = x5'

The variable names x1 to x5 refer to the corresponding regression coefficient. Thus, constraints are impose on regression coefficients and not on the data.

Second, the above constraints syntax can also be written in matrix/vector notation as:

(The first column refers to the intercept, the remaining five columns refer to the regression coefficients x1 to x5.)

myConstraints <-
    rbind(c(0, 0, 0,-1, 1, 0), #equality constraint x3 = x4
          c(0, 0, 0, 0,-1, 1), #equality constraint x4 = x5
          c(0, 1, 0, 0, 0, 0), #inequality constraint x1 > rhs
          c(0,-1, 1, 0, 0, 0)) #inequality constraint x1 < x2
  
# the length of rhs is equal to the number of myConstraints rows.      
myRhs <- c(0,0,0,0) 
    
# the first two rows should be considered as equality constraints
myNeq <- 2

Blank lines and comments can be used in between the constraints, and constraints can be split over multiple lines. Both the hashtag (#) and the exclamation (!) characters can be used to start a comment. Multiple constraints can be placed on a single line if they are separated by a semicolon (;), a comma (,) or the "&" sign.

In addition compound constraints can be stated via one or more longer equality or inequality sentences e.g., 'x1 > x2 > x3; x3 < 4 < x4' or 'x1 == x2 == x3 & x4 = 1'. Alternatively, the constrains can be specifies as '(x1, x2) > (x3, x4)' which is equivalent to 'x1 > x3; x1 > x4; x2 > x3; x2 > x4'.

There can be three types of text-based descriptions in the constraints syntax:

Equality constraints: The "==" or "=" operator can be used to define equality constraints (e.g., x1 = 1 or x1 = x2).
Inequality constraints: The "<" or ">" operator can be used to define inequality constraints (e.g., x1 > 1 or x1 < x2).
Newly defined parameters: The ":=" operator can be used to define new parameters, which take on values that are an arbitrary function of the original model parameters. The function must be specified in terms of the parameter names in coef(model) (e.g., new := x1 + 2*x2). By default, the standard errors for these defined parameters are computed by using the so-called Delta method.

Variable names of interaction effects in objects of class lm, rlm and glm contain a semi-colon (:) between the variables. To impose constraints on parameters of interaction effects, the semi-colon must be replaced by a dot (.) (e.g., x3:x4 becomes x3.x4). In addition, the intercept variable names is shown as "(Intercept)". To impose restrictions on the intercept both parentheses must be replaced by a dot ".Intercept." (e.g.,.Intercept. > 10). Note: in most practical situations we do not impose restrictions on the intercept because we do not have prior knowledge about the intercept. Moreover, the sign of the intercept can be changed arbitrarily by shifting the response variable y.

Each element can be modified using arithmetic operators. For example, if x2 is expected to be twice as large as x1, then "2*x2 = x1".

If constraints = NULL, the unrestricted model is fitted.

### Note on not full row-rank ###

If the restriction matrix is not of full row-rank, this means one of the following:

There is at least one redundant restriction. Then, either
- [a] Leave the redundant one out
- [b] Use another (more time-consuming) way of obtaining the level probabilities for the penalty term (goric function does this by default): Bootstrapping, as discussed above.
There is at least one range restriction (e.g., -2 < group1 < 2). Such a restriction can be evaluated but there is a sensitivity (of a scaling factor in the covariance matrix, like with a prior in a Bayes factor) which currently cannot be checked for.
There is at least one conflicting restriction (e.g., 2 < group1 < -2).

Such a restriction can evidently never hold and is thus impossible to evaluate. To prevent this type of error delete the one that is incorrect.

Value

An object of class restriktor, for which a print and a summary method are available. More specifically, it is a list with the following items:

`CON`	a list with useful information about the restrictions.
`call`	the matched call.
`timing`	how much time several tasks take.
`parTable`	a parameter table with information about the observed variables in the model and the imposed restrictions.
`b.unrestr`	unrestricted regression coefficients.
`b.restr`	restricted regression coefficients.
`residuals`	restricted residuals.
`wresid`	a working residual, weighted for "inv.var" weights only (rlm only)
`fitted`	restricted fitted mean values.
`weights`	(only for weighted fits) the specified weights.
`wgt`	the weights used in the IWLS process (rlm only).
`scale`	the robust scale estimate used (rlm only).
`stddev`	a scale estimate used for the standard errors.
`R2.org`	unrestricted R-squared.
`R2.reduced`	restricted R-squared.
`df.residual`	the residual degrees of freedom
`s2.unrestr`	mean squared error of unrestricted model.
`s2.restr`	mean squared error of restricted model.
`loglik`	restricted log-likelihood.
`Sigma`	variance-covariance matrix of unrestricted model.
`constraints`	matrix with restrictions.
`rhs`	vector of right-hand side elements.
`neq`	number of equality restrictions.
`wt.bar`	chi-bar-square mixing weights or a.k.a. level probabilities.
`iact`	active restrictions.
`converged`	did the IWLS converge (rlm only)?
`iter`	number of iteration needed for convergence (rlm only).
`bootout`	object of class boot. Only available if bootstrapped standard errors are requested, else bootout = NULL.
`control`	list with control options.
`model.org`	original model.
`se`	as input. This information is needed in the summary function.
`information`	observed information matrix with the inverted information matrix and the augmented information matrix as attributes.

Author(s)

Leonard Vanbrabant and Yves Rosseel

References

Schoenberg, R. (1997). Constrained Maximum Likelihood. Computational Economics, 10, 251–266.

Shapiro, A. (1988). Towards a unified theory of inequality-constrained testing in multivariate analysis. International Statistical Review 56, 49–62.

Silvapulle, M.J. and Sen, P.K. (2005). Constrained Statistical Inference. Wiley, New York

Examples

## lm
## unrestricted linear model for ages (in months) at which an 
## infant starts to walk alone.

# prepare data
DATA1 <- subset(ZelazoKolb1972, Group != "Control")

# fit unrestricted linear model
fit1.lm <- lm(Age ~ -1 + Group, data = DATA1)

# the variable names can be used to impose restrictions on
# the corresponding regression parameters.
coef(fit1.lm)

# restricted linear model with restrictions that the walking 
# exercises would not have a negative effect of increasing the 
# mean age at which a child starts to walk. 
fit1.con <- restriktor(fit1.lm, constraints = ' GroupActive  < GroupPassive < GroupNo ')
summary(fit1.con)

# Or in matrix notation.
myConstraints1 <- rbind(c(-1, 1, 0),
                        c( 0,-1, 1))
myRhs1 <- rep(0L, nrow(myConstraints1)) 
myNeq1 <- 0

fit1.con <- restriktor(fit1.lm, constraints = myConstraints1,
                       rhs = myRhs1, neq = myNeq1)
summary(fit1.con)

#########################
## Artificial examples ##
#########################
library(MASS)

## mlm
# generate data
n <- 30
mu <- rep(0, 4)
Sigma <- matrix(5,4,4)
  diag(Sigma) <- c(10,10,10,10)
# 4 Y's.
Y <- mvrnorm(n, mu, Sigma)

# fit unrestricted multivariate linear model
fit.mlm <- lm(Y ~ 1)

# constraints
myConstraints2 <- diag(0,4)
  diag(myConstraints2) <- 1

# fit restricted multivariate linear model
fit2.con <- restriktor(fit.mlm, constraints = myConstraints2)

summary(fit2.con)


## rlm
# generate data
n <- 10
means <- c(1,2,1,3)
nm <- length(means)
group <- as.factor(rep(1:nm, each = n))
y <- rnorm(n * nm, rep(means, each = n))
DATA2 <- data.frame(y, group)

# fit unrestricted robust linear model
fit3.rlm <- rlm(y ~ -1 + group, data = DATA2, method = "MM")
coef(fit3.rlm)

## increasing means
myConstraints3 <- ' group1 < group2 < group3 < group4 '

# fit restricted robust linear model and compute 
# Huber-White (robust) standard errors.
fit3.con <- restriktor(fit3.rlm, constraints = myConstraints3, 
                       se = "HC0")
summary(fit3.con)


## increasing means in matrix notation.
myConstraints3 <- rbind(c(-1, 1, 0, 0),
                        c( 0,-1, 1, 0),
                        c( 0, 0,-1, 1))
myRhs3 <- rep(0L, nrow(myConstraints3)) 
myNeq3 <- 0

fit3.con <- restriktor(fit3.rlm, constraints = myConstraints3,
                       rhs = myRhs3, neq = myNeq3, se = "HC0")
summary(fit3.con)

## equality restrictions only.
myConstraints4 <- ' group1 = group2 = group3 = group4 '
                    
fit4.con <- restriktor(fit3.rlm, constraints = myConstraints4)
summary(fit4.con)


## combination of equality and inequality restrictions.
myConstraints5 <- ' group1  = group2
                    group3  < group4 '

# fit restricted model and compute model-based bootstrapped 
# standard errors. We only generate 9 bootstrap samples in this 
# example; in practice you may wish to use a much higher number.
fit5.con <- restriktor(fit3.rlm, constraints = myConstraints4, 
                       se = "boot.model.based", B = 9)
# an error is probably thrown, due to a too low number of bootstrap draws.
summary(fit5.con)

# restriktor can also be used to define effects using the := operator 
# and impose restrictions on them. For example, compute the average 
# effect (AVE) and impose the restriction AVE > 0.
# generate data
n <- 30
b0 <- 10; b1 = 0.5; b2 = 1; b3 = 1.5
X <- c(rep(c(0), n/2), rep(c(1), n/2))
set.seed(90) 
Z <- rnorm(n, 16, 5)
y <- b0 + b1*X + b2*Z + b3*X*Z + rnorm(n, 0, sd = 10) 
DATA3 = data.frame(cbind(y, X, Z))

# fit linear model with interaction
fit6.lm <- lm(y ~ X*Z, data = DATA3)

fit6.con <- restriktor(fit6.lm, constraints = ' AVE := X + 16.86137*X.Z; 
                                                AVE > 0 ')
summary(fit6.con)

[Package restriktor version 0.5-80 Index]