dsfa {dsfa}R Documentation

dsfa: Distributional Stochastic Frontier Analysis


The dsfa package implements the specification, estimation and prediction of distributional stochastic frontier models via mgcv. The basic distributional stochastic frontier model is given by:

Y_n = \eta^\mu(\boldsymbol{x}_n^\mu) + V_n + s \cdot U_n

where n \in \{1,2,...,N\}. V_n and U_n are the noise and (in)efficiency respectively.

Consequently, Y_n follows a composed-error distribution. For an overview see dcomper().

Let \theta_n be a parameter of the distribution of Y_n, e.g. \theta_n \in \{\mu_n, \sigma_{Un}, \sigma_{Vn}\}. Further, let g^{-1}_{\theta}(\cdot) be the monotonic response function, which links the additive predictor \eta(\boldsymbol{x}_n^\theta) to the parameter space for the parameter \theta_n via the additive model:

g^{-1}_{\theta}(\theta_n)=\eta(\boldsymbol{x}_n^\theta)=\beta^\theta_0 + \sum_{j^\theta=1}^{J^\theta} h^\theta_{j^\theta}(x^\theta_{nj^\theta})

Thus, the additive predictor \eta(\boldsymbol{x}_n^\theta) is made up by the intercept \beta^\theta_0 and J^\theta smooths terms. The mgcv packages provides a framework for fitting distributional regression models. For more information see comper(). The additive predictors can be defined via formulae in gam(). Within the formulae for the parameter \theta_n, the smooth function for the variable x^\theta_{nj^\theta} can be specified via the function s(), which is h^\theta_{j^\theta}(\cdot) in the notation above. The smooth functions may be:

An overview is provided at smooth.terms(). The functions gam(), predict.gam() and plot.gam(), are alike to the basic S functions. A number of other functions such as summary.gam(), residuals.gam and anova.gam are also provided, for extracting information from a fitted gamOject.

The main functions are:

Further useful functions are:

These are written in C++ for fast and accurate evaluation including derivatives. They may be helpful for other researchers, who want to avoid the tedious implementation. Additionally:




### First example with simulated data
#Set seed, sample size and type of function
N=500 #Sample size
s=-1 #Set to production function

#Generate covariates
x1<-runif(N,-1,1); x2<-runif(N,-1,1); x3<-runif(N,-1,1)
x4<-runif(N,-1,1); x5<-runif(N,-1,1)

#Set parameters of the distribution
mu=2+0.75*x1+0.4*x2+0.6*x2^2+6*log(x3+2)^(1/4) #production function parameter
sigma_v=exp(-1.5+0.75*x4) #noise parameter
sigma_u=exp(-1+sin(2*pi*x5)) #inefficiency parameter

#Simulate responses and create dataset
y<-rcomper(n=N, mu=mu, sigma_v=sigma_v, sigma_u=sigma_u, s=s, distr="normhnorm")
dat<-data.frame(y, x1, x2, x3, x4, x5)

#Write formulae for parameters
mu_formula<-y~x1+x2+I(x2^2)+s(x3, bs="ps")
sigma_u_formula<-~1+s(x5, bs="ps")

#Fit model
model<-mgcv::gam(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
                 data=dat, family=comper(s=s, distr="normhnorm"), optimizer = c("efs"))

#Model summary

#Smooth effects
#Effect of x3 on the predictor of the production function
plot(model, select=1) #Estimated function
lines(x3[order(x3)], 6*log(x3[order(x3)]+2)^(1/4)-
        mean(6*log(x3[order(x3)]+2)^(1/4)), col=2) #True effect

#Effect of x5 on the predictor of the inefficiency
plot(model, select=2) #Estimated function
lines(x5[order(x5)], -1+sin(2*pi*x5)[order(x5)]-
        mean(-1+sin(2*pi*x5)),col=2) #True effect

### Second example with real data of production function

data("RiceFarms", package = "plm") #load data
RiceFarms[,c("goutput","size","seed", "totlabor", "urea")]<-
  log(RiceFarms[,c("goutput","size","seed", "totlabor", "urea")]) #log outputs and inputs
RiceFarms$id<-factor(RiceFarms$id) #id as factor

#Set to production function

#Write formulae for parameters
mu_formula<-goutput ~  s(size, bs="ps") + s(seed, bs="ps") + #non-linear effects
  s(totlabor, bs="ps") + s(urea, bs="ps") + #non-linear effects
  varieties + #factor
  s(id, bs="re") #random effect

#Fit model with normhnorm dstribution
model_normhnorm<-mgcv::gam(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=RiceFarms, family=comper(s=s, distr="normhnorm"), optimizer = "efs")

#Summary of model

#Plot smooths

### Third example with real data of cost function

data("electricity", package = "sfaR") #load data

#Log inputs and outputs as in Greene 1990 eq. 46

#Set to cost function

#Write formulae for parameters
mu_formula<-lcof ~ s(lo, bs="ps") + s(llf, bs="ps") + s(lcf, bs="ps") #non-linear effects
sigma_u_formula<-~s(lo, bs="ps") + s(lshare, bs="ps") + s(cshare, bs="ps")

#Fit model with normhnorm dstribution
model_normhnorm<-mgcv::gam(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
                           data=electricity, family=comper(s=s, distr="normhnorm"),
                           optimizer = "efs")

#Summary of model

#Plot smooths

[Package dsfa version 2.0.1 Index]