R: Regression for compositions following a SGB distribution

regSGB {SGB}

R Documentation

Regression for compositions following a SGB distribution

Description

Explanatory variables may influence the scale vector through a linear model applied to a log-ratio transform of the compositions. The shape parameters do not depend on explanatory variables. The overall shape parameter shape1 is common to all parts, whereas the Dirichlet shape parameters vector shape2 are specific to each part, i.e. shape2[j] is the Dirichlet parameter for u[i,j], i=1,...,n, (n=number of compositions in the dataset u).

Usage

regSGB(d, ...)

## Default S3 method:
regSGB(d, u, V, weight=rep(1,dim(d)[1]), 
    shape10 = 1, bound = 2.1, ind = NULL, shape1 = NULL, Mean2 = TRUE, 
    control.optim = list(trace=0,fnscale=-1),
    control.outer = list(itmax=1000,ilack.max=200,trace=TRUE, kkt2.check =TRUE,
    method = "BFGS"),...)
       
## S3 method for class 'formula'
regSGB(Formula, data= list(), weight=rep(1,dim(d)[1]), 
    shape10 = 1, bound = 2.1, ind = NULL, shape1 = 1,  Mean2=TRUE,
    control.optim = list(trace=0,fnscale=-1),
    control.outer = list(itmax=1000,ilack.max=200,trace=TRUE,kkt2.check =TRUE, 
    method = "BFGS"),...)
         
## S3 method for class 'regSGB'
print(x, ...)

## S3 method for class 'regSGB'
summary(object, digits=3,...)

Arguments

`Formula`	formula of class Formula, see `Formula`.
`d`	data matrix of explanatory variables (without constant vector) `(n \times m)`; `n`: sample size, `m`: number of auxiliary variables.
`u`	data matrix of compositions (independent variables) `(n \times D)`; `D`: number of parts.
`V`	log-ratio transformation matrix `(D \times (D-1))`.
`data`	a list with 3 components `d`, `u` and `V`.
`weight`	vector of length `n`; positive observation weights, default rep(1,n). Should be scaled to sum to `n`.
`shape10`	positive number, initial value of the overall shape parameter, default 1.
`bound`	inequality constraints on the estimates of shapes: `shape1*shape2[i] > bound, i=1,...,D.` By default `bound = 2.1`, see `InequalityConstr`.
`ind`	vector of length equal to the number of fixed parameters; see `index` in `EqualityConstr`. Default `ind` = NULL (no fixed parameters).
`shape1`	fixed value of the overall shape parameter if `min(ind)=1`. Default is 1.
`Mean2`	logical, if TRUE (default), the computed `shape2` parameters are each replaced by their average. See `initpar.SGB`.
`control.optim`	list of control parameters for optim, see `optim`. Default is from `auglag`, except `list(fnscale = -1)`. Always specify `fnscale = -1`.
`control.outer`	list of control parameters to be used by the outer loop in `constrOptim.nl`, see `auglag`. Default is from `auglag`, except `list(itmax = 1000, ilack.max = 200`.
`object`	an object of class "regSGB".
`digits`	number of decimal places for print, default 3.
`x`	an object of class "regSGB".
`...`	not used.

Details

It is advisable to use the formula to specify the model for easy comparison between models. Without formula, the d matrix of explanatory variables must contain exactly the variables used in the model, whereas with formula other variables can be included as well. Variable transformations can be utilized within the formula, see Example 4 below with the indicator I and the log.
Constraints on parameters can be introduced, see example 5 and EqualityConstr for more details.
Use weight for pseudo-likelihood estimation. weight is scaled to n, the sample size.
A design based covariance matrix of the parameters can be obtained by linearization as the covariance matrix of the scores.

Value

A list of class 'regSGB' with the following components:
The first 13 form the output from auglag.

`par`	Vector of length `npar`. Parameters that optimize the nonlinear objective function, satisfying constraints, if convergence is successful.
`value`	The value of the objective function at termination.
`counts`	A vector of length 2 denoting the number of times the objective and its gradient were evaluated, respectively.
`convergence`	An integer code indicating the type of convergence. 0 indicates successful convergence. Positive integer codes indicate failure to converge.
`message`	A character string giving any additional information on convergence returned by `optim`, or NULL.
`outer.iteration`	Number of outer iterations.
`lambda`	Values of the Lagrangian parameter. This is a vector of the same length as the total number of inequalities and equalities. It must be zero for inactive inequalities; non-negative for active inequalities; and can have any sign for equalities.
`sigma`	Value of augmented penalty parameter for the quadratic term.
`gradient`	Gradient of the augmented Lagrangian function at convergence. It should be small.
`hessian`	Hessian of the augmented Lagrangian function at convergence. It should be negative definite for maximization.
`ineq`	Values of inequality constraints at convergence. All of them must be non-negative.
`equal`	Values of equality constraints at convergence. All of them must be close to zero.
`kkt1`	A logical variable indicating whether or not the first-order KKT conditions were satisfied (printed 1 if conditions satisfied and 0 otherwise).
`kkt2`	A logical variable indicating whether or not the second-order KKT conditions were satisfied (printed 1 if conditions satisfied and 0 otherwise).
`scale`	`n \times D` matrix, the estimated scale compositions, see `bval`.
`meanA`	Aitchison expectation at estimated parameters.
`fitted.values`	`(n \times (D-1))` matrix, estimated log-ratio transforms.
`residuals`	Observed minus estimated log-ratio transforms.
`scores`	matrix `n \times npar`. Each row contains the (unweighted) derivatives of the log-density at a data point w.r.t the parameters.
`Rsquare`	ratio of total variation of `meanA` and total variation of compositions `u`.
`vcov`	The robust covariance matrix of parameters estimates, see `covest.SGB`.
`StdErr1`	Ordinary asymptotic standard errors of parameters.
`StdErr`	Robust asymptotic standard errors of parameters.
`fixed.par`	Indices of the fixed parameters.
`summary`	The summary from `covest.SGB`.
`AIC`	AIC criterion.
`V`	log-ratio transformation matrix (same as corresponding input parameter `V`)
`call`	Arguments for calling `regSGB`.
`Formula`	Expression for formula.

References

Graf, M. (2017). A distribution on the simplex of the Generalized Beta type. In J. A. Martin-Fernandez (Ed.), Proceedings CoDaWork 2017, University of Girona (Spain), 71-90.

Hijazi, R. H. and R. W. Jernigan (2009). Modelling compositional data using Dirichlet regression models. Journal of Applied Probability and Statistics, 4 (1), 77-91.

Kotz, S., N. Balakrishnan, and N. L. Johnson (2000). Continuous Multivariate Distributions, Volume 1, Models and Applications. John Wiley & Sons.

Madsen, K., H. Nielsen, and O. Tingleff (2004). Optimization With Constraints. Informatics and Mathematical Modelling, Technical University of Denmark.

Monti, G. S., G. Mateu-Figueras, and V. Pawlowsky-Glahn (2011). Notes on the scaled Dirichlet distribution. In V. Pawlowsky-Glahn and A. Buccianti (Eds.), Compositional data analysis. Theory and applications. Wiley.

Varadhan, R. (2015). alabama: Constrained Nonlinear Optimization. R package version 2015.3-1.

Wicker, N., J. Muller, R. K. R. Kalathur, and O. Poch (2008). A maximum likelihood approximation method for Dirichlet parameter estimation. Computational Statistics & Data Analysis 52 (3), 1315-1322.

Zeileis, A. and Y. Croissant (2010). Extended model formulas in R: Multiple parts and multiple responses. Journal of Statistical Software 34 (1), 1-13.

Examples

## Regression for car segment shares
## ---------------------------------
data(carseg)
## Extract the compositions
uc <- as.matrix(carseg[,(1:5)])

## Extract the explanatory variables
attach(carseg)

## Example 1: without formula
## --------------------------
## Change some variables
dc <- data.frame(l.exp1=log(expend)*PAC,l.exp0=log(expend)*(1-PAC), l.sent=log(sent),
l.FBCF=log(FBCF), l.price=log(price), rates)

## Define the log-ratio transformation matrix
Vc <- matrix(c( 1,0,0,0,
               -1,1,0,0,
               0,-1,1,0,
               0,0,-1,1,
               0,0,0,-1),ncol=4,byrow=TRUE)
colnames(Vc) <- c("AB","BC","CD","DE")
rownames(Vc) <- colnames(uc)
Vc

# 2 next rows  only necessary when calling regSGB without a formula.
dc1 <- cbind("(Intercept)"= 1 , dc)
dc1 <- as.matrix(dc1)   

object10 <- regSGB(dc1,uc, Vc,shape10=4.4)
summary(object10)

## Example 2: same with formula
## ----------------------------
## Define the formula
Form <- Formula(AB | BC | CD | DE ~  l.exp1 + l.exp0 + l.sent + l.FBCF + l.price +  rates)

## Regression with formula
object1 <- regSGB(Form, data= list(dc, uc, Vc),shape10=4.4)

summary(object1)

## Example 3: Usage of I()
## -----------------------
Form2 <- Formula(AB | BC | CD | DE ~  I(l.exp1 + l.exp0) + l.exp1 +l.sent + 
                 l.FBCF + l.price + rates )
object2 <- regSGB(Form2,data= list(dc, uc, Vc),shape10=4.4)
object2

## Example 4: Usage of variable transformations on the original file
## -----------------------------------------------------------------
Form3 <- Formula(AB | BC | CD | DE ~  log(expend) + I(PAC*log(expend)) + log(sent) + log(FBCF) + 
                 log(price) + rates)
object3 <- regSGB(Form3, data=list(carseg, uc, Vc),shape10=4.4)
object3
object2[["par"]]-object3[["par"]]    # same results

## Example 5: Fixing parameter values
## ----------------------------------
## 1. In the following regression we condition on shape1 = 2.36.
object4 <- regSGB(Form3,data=list(carseg, uc, Vc), 
                  shape10 = 4.4,  bound = 2.0, ind = 1, shape1 = 2.36)
summary(object4)

## 2. In the following regression we condition on shape1 = 2.36 and the  coefficient of 
## log(FBCF).BC = 0.  Notice that it is the 19th parameter.
object5 <- regSGB(Form3,data=list(carseg, uc, Vc),
                  shape10 = 4.4, bound = 2.0, ind = c(1,19) , shape1 = 2.36)
summary(object5)

object3[["AIC"]]
object4[["AIC"]]  # largest AIC
object5[["AIC"]]

[Package SGB version 1.0.1.1 Index]