regSGB {SGB}R Documentation

Regression for compositions following a SGB distribution

Description

Explanatory variables may influence the scale vector through a linear model applied to a log-ratio transform of the compositions. The shape parameters do not depend on explanatory variables. The overall shape parameter shape1 is common to all parts, whereas the Dirichlet shape parameters vector shape2 are specific to each part, i.e. shape2[j] is the Dirichlet parameter for u[i,j], i=1,...,n, (n=number of compositions in the dataset u).

Usage

regSGB(d, ...)

## Default S3 method:
regSGB(d, u, V, weight=rep(1,dim(d)[1]), 
    shape10 = 1, bound = 2.1, ind = NULL, shape1 = NULL, Mean2 = TRUE, 
    control.optim = list(trace=0,fnscale=-1),
    control.outer = list(itmax=1000,ilack.max=200,trace=TRUE, kkt2.check =TRUE,
    method = "BFGS"),...)
       
## S3 method for class 'formula'
regSGB(Formula, data= list(), weight=rep(1,dim(d)[1]), 
    shape10 = 1, bound = 2.1, ind = NULL, shape1 = 1,  Mean2=TRUE,
    control.optim = list(trace=0,fnscale=-1),
    control.outer = list(itmax=1000,ilack.max=200,trace=TRUE,kkt2.check =TRUE, 
    method = "BFGS"),...)
         
## S3 method for class 'regSGB'
print(x, ...)

## S3 method for class 'regSGB'
summary(object, digits=3,...)

Arguments

Formula

formula of class Formula, see Formula.

d

data matrix of explanatory variables (without constant vector) (n \times m); n: sample size, m: number of auxiliary variables.

u

data matrix of compositions (independent variables) (n \times D); D: number of parts.

V

log-ratio transformation matrix (D \times (D-1)).

data

a list with 3 components d, u and V.

weight

vector of length n; positive observation weights, default rep(1,n). Should be scaled to sum to n.

shape10

positive number, initial value of the overall shape parameter, default 1.

bound

inequality constraints on the estimates of shapes:
shape1*shape2[i] > bound, i=1,...,D.
By default bound = 2.1, see InequalityConstr.

ind

vector of length equal to the number of fixed parameters; see index in EqualityConstr. Default ind = NULL (no fixed parameters).

shape1

fixed value of the overall shape parameter if min(ind)=1. Default is 1.

Mean2

logical, if TRUE (default), the computed shape2 parameters are each replaced by their average. See initpar.SGB.

control.optim

list of control parameters for optim, see optim. Default is from auglag, except list(fnscale = -1). Always specify fnscale = -1.

control.outer

list of control parameters to be used by the outer loop in constrOptim.nl, see auglag. Default is from auglag, except
list(itmax = 1000, ilack.max = 200.

object

an object of class "regSGB".

digits

number of decimal places for print, default 3.

x

an object of class "regSGB".

...

not used.

Details

It is advisable to use the formula to specify the model for easy comparison between models. Without formula, the d matrix of explanatory variables must contain exactly the variables used in the model, whereas with formula other variables can be included as well. Variable transformations can be utilized within the formula, see Example 4 below with the indicator I and the log.
Constraints on parameters can be introduced, see example 5 and EqualityConstr for more details.
Use weight for pseudo-likelihood estimation. weight is scaled to n, the sample size.
A design based covariance matrix of the parameters can be obtained by linearization as the covariance matrix of the scores.

Value

A list of class 'regSGB' with the following components:
The first 13 form the output from auglag.

par

Vector of length npar. Parameters that optimize the nonlinear objective function, satisfying constraints, if convergence is successful.

value

The value of the objective function at termination.

counts

A vector of length 2 denoting the number of times the objective and its gradient were evaluated, respectively.

convergence

An integer code indicating the type of convergence. 0 indicates successful convergence. Positive integer codes indicate failure to converge.

message

A character string giving any additional information on convergence returned by optim, or NULL.

outer.iteration

Number of outer iterations.

lambda

Values of the Lagrangian parameter. This is a vector of the same length as the total number of inequalities and equalities. It must be zero for inactive inequalities; non-negative for active inequalities; and can have any sign for equalities.

sigma

Value of augmented penalty parameter for the quadratic term.

gradient

Gradient of the augmented Lagrangian function at convergence. It should be small.

hessian

Hessian of the augmented Lagrangian function at convergence. It should be negative definite for maximization.

ineq

Values of inequality constraints at convergence. All of them must be non-negative.

equal

Values of equality constraints at convergence. All of them must be close to zero.

kkt1

A logical variable indicating whether or not the first-order KKT conditions were satisfied (printed 1 if conditions satisfied and 0 otherwise).

kkt2

A logical variable indicating whether or not the second-order KKT conditions were satisfied (printed 1 if conditions satisfied and 0 otherwise).

scale

n \times D matrix, the estimated scale compositions, see bval.

meanA

Aitchison expectation at estimated parameters.

fitted.values

(n \times (D-1)) matrix, estimated log-ratio transforms.

residuals

Observed minus estimated log-ratio transforms.

scores

matrix n \times npar. Each row contains the (unweighted) derivatives of the log-density at a data point w.r.t the parameters.

Rsquare

ratio of total variation of meanA and total variation of compositions u.

vcov

The robust covariance matrix of parameters estimates, see covest.SGB.

StdErr1

Ordinary asymptotic standard errors of parameters.

StdErr

Robust asymptotic standard errors of parameters.

fixed.par

Indices of the fixed parameters.

summary

The summary from covest.SGB.

AIC

AIC criterion.

V

log-ratio transformation matrix (same as corresponding input parameter V)

call

Arguments for calling regSGB.

Formula

Expression for formula.

References

Graf, M. (2017). A distribution on the simplex of the Generalized Beta type. In J. A. Martin-Fernandez (Ed.), Proceedings CoDaWork 2017, University of Girona (Spain), 71-90.

Hijazi, R. H. and R. W. Jernigan (2009). Modelling compositional data using Dirichlet regression models. Journal of Applied Probability and Statistics, 4 (1), 77-91.

Kotz, S., N. Balakrishnan, and N. L. Johnson (2000). Continuous Multivariate Distributions, Volume 1, Models and Applications. John Wiley & Sons.

Madsen, K., H. Nielsen, and O. Tingleff (2004). Optimization With Constraints. Informatics and Mathematical Modelling, Technical University of Denmark.

Monti, G. S., G. Mateu-Figueras, and V. Pawlowsky-Glahn (2011). Notes on the scaled Dirichlet distribution. In V. Pawlowsky-Glahn and A. Buccianti (Eds.), Compositional data analysis. Theory and applications. Wiley.

Varadhan, R. (2015). alabama: Constrained Nonlinear Optimization. R package version 2015.3-1.

Wicker, N., J. Muller, R. K. R. Kalathur, and O. Poch (2008). A maximum likelihood approximation method for Dirichlet parameter estimation. Computational Statistics & Data Analysis 52 (3), 1315-1322.

Zeileis, A. and Y. Croissant (2010). Extended model formulas in R: Multiple parts and multiple responses. Journal of Statistical Software 34 (1), 1-13.

See Also

stepSGB, for an experimental stepwise descending regression, initpar.SGB, for the computation of initial parameters. This function uses Formula, auglag.

Examples

## Regression for car segment shares
## ---------------------------------
data(carseg)
## Extract the compositions
uc <- as.matrix(carseg[,(1:5)])

## Extract the explanatory variables
attach(carseg)

## Example 1: without formula
## --------------------------
## Change some variables
dc <- data.frame(l.exp1=log(expend)*PAC,l.exp0=log(expend)*(1-PAC), l.sent=log(sent),
l.FBCF=log(FBCF), l.price=log(price), rates)

## Define the log-ratio transformation matrix
Vc <- matrix(c( 1,0,0,0,
               -1,1,0,0,
               0,-1,1,0,
               0,0,-1,1,
               0,0,0,-1),ncol=4,byrow=TRUE)
colnames(Vc) <- c("AB","BC","CD","DE")
rownames(Vc) <- colnames(uc)
Vc

# 2 next rows  only necessary when calling regSGB without a formula.
dc1 <- cbind("(Intercept)"= 1 , dc)
dc1 <- as.matrix(dc1)   

object10 <- regSGB(dc1,uc, Vc,shape10=4.4)
summary(object10)

## Example 2: same with formula
## ----------------------------
## Define the formula
Form <- Formula(AB | BC | CD | DE ~  l.exp1 + l.exp0 + l.sent + l.FBCF + l.price +  rates)

## Regression with formula
object1 <- regSGB(Form, data= list(dc, uc, Vc),shape10=4.4)

summary(object1)

## Example 3: Usage of I()
## -----------------------
Form2 <- Formula(AB | BC | CD | DE ~  I(l.exp1 + l.exp0) + l.exp1 +l.sent + 
                 l.FBCF + l.price + rates )
object2 <- regSGB(Form2,data= list(dc, uc, Vc),shape10=4.4)
object2

## Example 4: Usage of variable transformations on the original file
## -----------------------------------------------------------------
Form3 <- Formula(AB | BC | CD | DE ~  log(expend) + I(PAC*log(expend)) + log(sent) + log(FBCF) + 
                 log(price) + rates)
object3 <- regSGB(Form3, data=list(carseg, uc, Vc),shape10=4.4)
object3
object2[["par"]]-object3[["par"]]    # same results

## Example 5: Fixing parameter values
## ----------------------------------
## 1. In the following regression we condition on shape1 = 2.36.
object4 <- regSGB(Form3,data=list(carseg, uc, Vc), 
                  shape10 = 4.4,  bound = 2.0, ind = 1, shape1 = 2.36)
summary(object4)

## 2. In the following regression we condition on shape1 = 2.36 and the  coefficient of 
## log(FBCF).BC = 0.  Notice that it is the 19th parameter.
object5 <- regSGB(Form3,data=list(carseg, uc, Vc),
                  shape10 = 4.4, bound = 2.0, ind = c(1,19) , shape1 = 2.36)
summary(object5)

object3[["AIC"]]
object4[["AIC"]]  # largest AIC
object5[["AIC"]]

[Package SGB version 1.0.1.1 Index]