dsfa {dsfa} | R Documentation |
The dsfa
package implements the specification, estimation and prediction of distributional stochastic frontier models via mgcv
.
The basic distributional stochastic frontier model is given by:
Y_n = \eta^\mu(\boldsymbol{x}_n^\mu) + V_n + s \cdot U_n
where n \in \{1,2,...,N\}
. V_n
and U_n
are the noise and (in)efficiency respectively.
For s=-1
, \eta^\mu(\cdot)
is the production function and \boldsymbol{x}_n^\mu
are the log inputs. Alternatively, if s=1
, \eta^\mu(\cdot)
is the cost function and \boldsymbol{x}_n^\mu
are the log cost. The vector \boldsymbol{x}_n^\mu
may also contain other variables.
The noise is represented as V_n \sim N(0,\sigma_{Vn}^2)
, where \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}}))
. Here, \boldsymbol{x}_n^{\sigma_{V}}
are the observed covariates which influence the parameter of the noise.
The inefficiency can be represented in two ways.
If U_n \sim HN(\sigma_{Un}^2)
, where \sigma_{Un}=\exp(\eta^{\sigma_{Un}}(\boldsymbol{x}_n^{\sigma_{U}}))
. Here, \boldsymbol{x}_n^{\sigma_{U}}
are the observed covariates which influence the parameter of the (in)efficiency. Consequently:
Y_n \sim normhnorm(\mu_n=\eta^\mu(\boldsymbol{x}_n^\mu), \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}})), \sigma_{Un}=\exp(\eta^{\sigma_{U}}(\boldsymbol{x}_n^{\sigma_{U}})), s=s)
. For more details see dnormhnorm()
.
If U_n \sim Exp(\lambda_{n})
, where \lambda_{n}=\exp(\eta^{\lambda_{n}}(\boldsymbol{x}_n^{\lambda}))
. Here, \boldsymbol{x}_n^{\lambda}
are the observed covariates which influence the parameter of the (in)efficiency. Consequently:
Y_n \sim normexp(\mu_n=\eta^\mu(\boldsymbol{x}_n^\mu), \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}})), \lambda_{n}=\exp(\eta^{\lambda}(\boldsymbol{x}_n^{\lambda})), s=s)
. For more details see dnormexp()
.
Let \theta_n
be a parameter of the distribution of Y_n
.
Further, let g^{-1}_{\theta}(\cdot)
be the monotonic response function, which links the additive predictor \eta(\boldsymbol{x}_n^\theta)
to the parameter space for the parameter \theta_n
via the additive model:
g^{-1}_{\theta}(\theta_n)=\eta(\boldsymbol{x}_n^\theta)=\beta^\theta_0 + \sum_{j^\theta=1}^{J^\theta} h^\theta_{j^\theta}(x^\theta_{nj^\theta})
Thus, the additive predictor \eta(\boldsymbol{x}_n^\theta)
is made up by the intercept \beta^\theta_0
and J^\theta
smooths terms.
The mgcv
packages provides a framework for fitting distributional regression models.
The additive predictors can be defined via formulae in gam()
. Within the formulae for the parameter \theta_n
, the smooth function for the variable x^\theta_{nj^\theta}
can be specified via the function s()
, which is h^\theta_{j^\theta}(\cdot)
in the notation above.
The smooth functions may be:
linear effects
non-linear effects which can be modeled via penalized regression splines, e.g. p.spline()
, tprs()
random effects, random.effects()
,
spatial effects which can be modeled via mrf()
.
An overview is provided at smooth.terms()
. The functions gam()
, predict.gam()
and plot.gam()
, are alike to the basic S functions.
A number of other functions such as summary.gam()
, residuals.gam
and anova.gam
are also provided, for extracting information from a fitted gamOject
.
The main functions are:
normhnorm()
Object which can be used to fit a normal-halfnormal stochastic frontier model with the mgcv
package.
normexp()
Object which can be used to fit a normal-exponential stochastic frontier model with the mgcv
package.
comperr_mv()
Object which can be used to fit a multivariate stochastic frontier model with the mgcv
package.
elasticity()
Calculates and plots the elasticity of a smooth function.
efficiency()
Calculates the expected technical (in)efficiency index E[u|\epsilon]
or E[\exp(-u)|\epsilon]
.
Rouven Schmidt rouven.schmidt@tu-clausthal.de
Schmidt R, Kneib T (2022). “Multivariate Distributional Stochastic Frontier Models.” arXiv preprint arXiv:2208.10294.
Wood SN, Fasiolo M (2017). “A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models.” Biometrics, 73(4), 1071–1081.
Kumbhakar SC, Wang H, Horncastle AP (2015). A practitioner's guide to stochastic frontier analysis using Stata. Cambridge University Press.
Schmidt R, Kneib T (2020). “Analytic expressions for the Cumulative Distribution Function of the Composed Error Term in Stochastic Frontier Analysis with Truncated Normal and Exponential Inefficiencies.” arXiv preprint arXiv:2006.03459.
#Set seed, sample size and type of function
set.seed(1337)
N=500 #Sample size
s=-1 #Set to production function
#Generate covariates
x1<-runif(N,-1,1); x2<-runif(N,-1,1); x3<-runif(N,-1,1)
x4<-runif(N,-1,1); x5<-runif(N,-1,1)
#Set parameters of the distribution
mu=2+0.75*x1+0.4*x2+0.6*x2^2+6*log(x3+2)^(1/4) #production function parameter
sigma_v=exp(-1.5+0.75*x4) #noise parameter
sigma_u=exp(-1+sin(2*pi*x5)) #inefficiency parameter
#Simulate responses and create dataset
y<-rnormhnorm(n=N, mu=mu, sigma_v=sigma_v, sigma_u=sigma_u, s=s)
dat<-data.frame(y, x1, x2, x3, x4, x5)
#Write formulae for parameters
mu_formula<-y~x1+x2+I(x2^2)+s(x3, bs="ps")
sigma_v_formula<-~1+x4
sigma_u_formula<-~1+s(x5, bs="ps")
#Fit model
model<-mgcv::gam(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=dat, family=normhnorm(s=s), optimizer = c("efs"))
#Model summary
summary(model)
#Smooth effects
#Effect of x3 on the predictor of the production function
plot(model, select=1) #Estimated function
lines(x3[order(x3)], 6*log(x3[order(x3)]+2)^(1/4)-
mean(6*log(x3[order(x3)]+2)^(1/4)), col=2) #True effect
#Effect of x5 on the predictor of the inefficiency
plot(model, select=2) #Estimated function
lines(x5[order(x5)], -1+sin(2*pi*x5)[order(x5)]-
mean(-1+sin(2*pi*x5)),col=2) #True effect
#Estimate efficiency
efficiency(model, type="jondrow")
efficiency(model, type="battese")
#Get elasticities
elasticity(model)