dsfa {dsfa}R Documentation

dsfa-package: Distributional Stochastic Frontier Analysis


The dsfa package implements the specification, estimation and prediction of distributional stochastic frontier models via mgcv. The basic distributional stochastic frontier model is given by:

Y_n = \eta^\mu(\boldsymbol{x}_n^\mu) + V_n + s \cdot U_n

where n \in \{1,2,...,N\}. V_n and U_n are the noise and (in)efficiency respectively.

Consequently, Y_n follows a composed-error distribution. For an overview see dcomper.

Let \theta_n be a parameter of the distribution of Y_n, e.g. \theta_n \in \{\mu_n, \sigma_{Un}, \sigma_{Vn}\}. Further, let g^{-1}_{\theta}(\cdot) be the monotonic response function, which links the additive predictor \eta(\boldsymbol{x}_n^\theta) to the parameter space for the parameter \theta_n via the additive model:

g^{-1}_{\theta}(\theta_n)=\eta(\boldsymbol{x}_n^\theta)=\beta^\theta_0 + \sum_{j^\theta=1}^{J^\theta} h^\theta_{j^\theta}(x^\theta_{nj^\theta})

Thus, the additive predictor \eta(\boldsymbol{x}_n^\theta) is made up by the intercept \beta^\theta_0 and J^\theta smooths terms. The mgcv packages provides a framework for fitting distributional regression models. For more information see comper. The additive predictors can be defined via formulae in gam. Within the formulae for the parameter \theta_n, the smooth function for the variable x^\theta_{nj^\theta} can be specified via the function s, which is h^\theta_{j^\theta}(\cdot) in the notation above. The smooth functions may be:

An overview is provided at smooth.terms. The functions gam, predict.gam and plot.gam, are alike to the basic S functions. A number of other functions such as summary.gam, residuals.gam and anova.gam are also provided, for extracting information from a fitted gamOject.

The main functions are:

Further useful functions are:

These are written in C++ for fast and accurate evaluation including derivatives. They may be helpful for other researchers, who want to avoid the tedious implementation. Additionally:


  family = comper(link = list("identity", "logshift", "logshift"), s = -1, distr =
  data = list(),
  optimizer = "efs",



A list of formulas specifying the additive predictors. See formula.gam and gam.models for more details.


The family object specifies the (multivariate) composed-error distribution and link of the model. See comper and comper_mv for more details.


A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which dsfa is called.


An array specifying the numerical optimization method to use to optimize the smoothing parameter estimation criterion (given by method). "outer" for the more stable direct approach. "outer" can use several alternative optimizers, specified in the second element of optimizer: "newton" (default), "bfgs", "optim", "nlm" and "nlm.fd" (the latter is based entirely on finite differenced derivatives and is very slow). "efs" for the extended Fellner Schall method of Wood and Fasiolo (2017).


other parameters of gam


This function is a wrapper for gam.


Returns a gam object.




### First example with simulated data
#Set seed, sample size and type of function
N=500 #Sample size
s=-1 #Set to production function

#Generate covariates
x1<-runif(N,-1,1); x2<-runif(N,-1,1); x3<-runif(N,-1,1)
x4<-runif(N,-1,1); x5<-runif(N,-1,1)

#Set parameters of the distribution
mu=2+0.75*x1+0.4*x2+0.6*x2^2+6*log(x3+2)^(1/4) #production function parameter
sigma_v=exp(-1.5+0.75*x4) #noise parameter
sigma_u=exp(-1+sin(2*pi*x5)) #inefficiency parameter

#Simulate responses and create dataset
y<-rcomper(n=N, mu=mu, sigma_v=sigma_v, sigma_u=sigma_u, s=s, distr="normhnorm")
dat<-data.frame(y, x1, x2, x3, x4, x5)

#Write formulae for parameters
mu_formula<-y~x1+x2+I(x2^2)+s(x3, bs="ps")
sigma_u_formula<-~1+s(x5, bs="ps")

#Fit model
model<-dsfa(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
                 data=dat, family=comper(s=s, distr="normhnorm"), optimizer = c("efs"))

#Model summary

#Smooth effects
#Effect of x3 on the predictor of the production function
plot(model, select=1) #Estimated function
lines(x3[order(x3)], 6*log(x3[order(x3)]+2)^(1/4)-
        mean(6*log(x3[order(x3)]+2)^(1/4)), col=2) #True effect

#Effect of x5 on the predictor of the inefficiency
plot(model, select=2) #Estimated function
lines(x5[order(x5)], -1+sin(2*pi*x5)[order(x5)]-
        mean(-1+sin(2*pi*x5)),col=2) #True effect

### Second example with real data of production function

data("RiceFarms", package = "plm") #load data
RiceFarms[,c("goutput","size","seed", "totlabor", "urea")]<-
  log(RiceFarms[,c("goutput","size","seed", "totlabor", "urea")]) #log outputs and inputs
RiceFarms$id<-factor(RiceFarms$id) #id as factor

#Set to production function

#Write formulae for parameters
mu_formula<-goutput ~  s(size, bs="ps") + s(seed, bs="ps") + #non-linear effects
  s(totlabor, bs="ps") + s(urea, bs="ps") + #non-linear effects
  varieties + #factor
  s(id, bs="re") #random effect

#Fit model with normhnorm dstribution
model<-dsfa(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=RiceFarms, family=comper(s=s, distr="normhnorm"), optimizer = "efs")

#Summary of model

#Plot smooths

### Third example with real data of cost function

data("electricity", package = "sfaR") #load data

#Log inputs and outputs as in Greene 1990 eq. 46

#Set to cost function

#Write formulae for parameters
mu_formula<-lcof ~ s(lo, bs="ps") + s(llf, bs="ps") + s(lcf, bs="ps") #non-linear effects
sigma_u_formula<-~s(lo, bs="ps") + s(lshare, bs="ps") + s(cshare, bs="ps")

#Fit model with normhnorm dstribution
model<-dsfa(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
                           data=electricity, family=comper(s=s, distr="normhnorm"),
                           optimizer = "efs")

#Summary of model

#Plot smooths

[Package dsfa version 2.0.2 Index]