dsfa {dsfa} | R Documentation |
The dsfa
package implements the specification, estimation and prediction of distributional stochastic frontier models via mgcv
.
The basic distributional stochastic frontier model is given by:
Y_n = \eta^\mu(\boldsymbol{x}_n^\mu) + V_n + s \cdot U_n
where n \in \{1,2,...,N\}
.
V_n
and U_n
are the noise and (in)efficiency respectively.
For s=-1
, \eta^\mu(\cdot)
is the production function and \boldsymbol{x}_n^\mu
are the log inputs.
Alternatively, if s=1
, \eta^\mu(\cdot)
is the cost function and \boldsymbol{x}_n^\mu
are the log cost.
The vector \boldsymbol{x}_n^\mu
may also contain other variables.
The noise is represented as V_n \sim N(0,\sigma_{Vn}^2)
, where \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}}))
.
Here, \boldsymbol{x}_n^{\sigma_{V}}
are the observed covariates which influence the parameter of the noise.
The (in)efficiency can be represented in two ways.
If U_n \sim HN(\sigma_{Un}^2)
, where \sigma_{Un}=\exp(\eta^{\sigma_{Un}}(\boldsymbol{x}_n^{\sigma_{U}}))
.
Here, \boldsymbol{x}_n^{\sigma_{U}}
are the observed covariates which influence the parameter of the (in)efficiency.
Consequently:
Y_n \sim normhnorm(\mu_n=\eta^\mu(\boldsymbol{x}_n^\mu), \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}})), \sigma_{Un}=\exp(\eta^{\sigma_{U}}(\boldsymbol{x}_n^{\sigma_{U}})), s=s)
.
For more details see dnormhnorm()
.
If U_n \sim Exp(\sigma_{Un})
, where \sigma_{Un}=\exp(\eta^{\sigma_{Un}}(\boldsymbol{x}_n^{\sigma_{U}}))
.
Here, \boldsymbol{x}_n^{\sigma_{U}}
are the observed covariates which influence the parameter of the (in)efficiency.
Consequently:
Y_n \sim normexp(\mu_n=\eta^\mu(\boldsymbol{x}_n^\mu), \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}})), \sigma_{Un}=\exp(\eta^{\sigma_{U}}(\boldsymbol{x}_n^{\sigma_{U}})), s=s)
.
For more details see dnormexp()
.
Consequently, Y_n
follows a composed-error distribution. For an overview see dcomper()
.
Let \theta_n
be a parameter of the distribution of Y_n
, e.g. \theta_n \in \{\mu_n, \sigma_{Un}, \sigma_{Vn}\}
.
Further, let g^{-1}_{\theta}(\cdot)
be the monotonic response function, which links the additive predictor \eta(\boldsymbol{x}_n^\theta)
to the parameter space for the parameter \theta_n
via the additive model:
g^{-1}_{\theta}(\theta_n)=\eta(\boldsymbol{x}_n^\theta)=\beta^\theta_0 + \sum_{j^\theta=1}^{J^\theta} h^\theta_{j^\theta}(x^\theta_{nj^\theta})
Thus, the additive predictor \eta(\boldsymbol{x}_n^\theta)
is made up by the intercept \beta^\theta_0
and J^\theta
smooths terms.
The mgcv
packages provides a framework for fitting distributional regression models. For more information see comper()
.
The additive predictors can be defined via formulae in gam()
. Within the formulae for the parameter \theta_n
, the smooth function for the variable x^\theta_{nj^\theta}
can be specified via the function s()
, which is h^\theta_{j^\theta}(\cdot)
in the notation above.
The smooth functions may be:
linear effects, may include polynomials or regression splines.
non-linear effects, which can be modeled via penalized regression splines, e.g. p.spline()
, tprs()
.
random effects, random.effects()
.
spatial effects, which can be modeled via mrf()
.
An overview is provided at smooth.terms()
.
The functions gam()
, predict.gam()
and plot.gam()
,
are alike to the basic S
functions.
A number of other functions such as summary.gam()
, residuals.gam
and
anova.gam
are also provided, for extracting information from a fitted gamOject
.
The main functions are:
comper()
Object which can be used to fit a composed-error stochastic frontier model with the mgcv
package.
comper_mv()
Object which can be used to fit a multivariate composed-error stochastic frontier model with the mgcv
package.
elasticity()
Calculates and plots the elasticity of a smooth function.
efficiency()
Calculates the expected technical (in)efficiency index E[u|\epsilon]
or E[\exp(-u)|\epsilon]
.
Further useful functions are:
dcomper()
Probablitiy density function, distribution, quantile function and random number generation for the composed-error distribution.
dcomper_mv()
Probablitiy density function, distribution, quantile function and random number generation for the multivariate composed-error distribution.
dcop()
Probablitiy density function, distribution and random number generation for copulas.
These are written in C++
for fast and accurate evaluation including derivatives. They may be helpful for other researchers, who want to avoid the tedious implementation.
Additionally:
cop()
Object which can be used to fit a copula with the mgcv
package.
Rouven Schmidt rouven.schmidt@tu-clausthal.de
Schmidt R, Kneib T (2022). “Multivariate Distributional Stochastic Frontier Models.” arXiv preprint arXiv:2208.10294.
Wood SN, Fasiolo M (2017). “A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models.” Biometrics, 73(4), 1071–1081.
Kumbhakar SC, Wang H, Horncastle AP (2015). A practitioner's guide to stochastic frontier analysis using Stata. Cambridge University Press.
Schmidt R, Kneib T (2020). “Analytic expressions for the Cumulative Distribution Function of the Composed Error Term in Stochastic Frontier Analysis with Truncated Normal and Exponential Inefficiencies.” arXiv preprint arXiv:2006.03459.
### First example with simulated data
#Set seed, sample size and type of function
set.seed(1337)
N=500 #Sample size
s=-1 #Set to production function
#Generate covariates
x1<-runif(N,-1,1); x2<-runif(N,-1,1); x3<-runif(N,-1,1)
x4<-runif(N,-1,1); x5<-runif(N,-1,1)
#Set parameters of the distribution
mu=2+0.75*x1+0.4*x2+0.6*x2^2+6*log(x3+2)^(1/4) #production function parameter
sigma_v=exp(-1.5+0.75*x4) #noise parameter
sigma_u=exp(-1+sin(2*pi*x5)) #inefficiency parameter
#Simulate responses and create dataset
y<-rcomper(n=N, mu=mu, sigma_v=sigma_v, sigma_u=sigma_u, s=s, distr="normhnorm")
dat<-data.frame(y, x1, x2, x3, x4, x5)
#Write formulae for parameters
mu_formula<-y~x1+x2+I(x2^2)+s(x3, bs="ps")
sigma_v_formula<-~1+x4
sigma_u_formula<-~1+s(x5, bs="ps")
#Fit model
model<-mgcv::gam(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=dat, family=comper(s=s, distr="normhnorm"), optimizer = c("efs"))
#Model summary
summary(model)
#Smooth effects
#Effect of x3 on the predictor of the production function
plot(model, select=1) #Estimated function
lines(x3[order(x3)], 6*log(x3[order(x3)]+2)^(1/4)-
mean(6*log(x3[order(x3)]+2)^(1/4)), col=2) #True effect
#Effect of x5 on the predictor of the inefficiency
plot(model, select=2) #Estimated function
lines(x5[order(x5)], -1+sin(2*pi*x5)[order(x5)]-
mean(-1+sin(2*pi*x5)),col=2) #True effect
### Second example with real data of production function
data("RiceFarms", package = "plm") #load data
RiceFarms[,c("goutput","size","seed", "totlabor", "urea")]<-
log(RiceFarms[,c("goutput","size","seed", "totlabor", "urea")]) #log outputs and inputs
RiceFarms$id<-factor(RiceFarms$id) #id as factor
#Set to production function
s=-1
#Write formulae for parameters
mu_formula<-goutput ~ s(size, bs="ps") + s(seed, bs="ps") + #non-linear effects
s(totlabor, bs="ps") + s(urea, bs="ps") + #non-linear effects
varieties + #factor
s(id, bs="re") #random effect
sigma_v_formula<-~1
sigma_u_formula<-~bimas
#Fit model with normhnorm dstribution
model_normhnorm<-mgcv::gam(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=RiceFarms, family=comper(s=s, distr="normhnorm"), optimizer = "efs")
#Summary of model
summary(model_normhnorm)
#Plot smooths
plot(model_normhnorm)
### Third example with real data of cost function
data("electricity", package = "sfaR") #load data
#Log inputs and outputs as in Greene 1990 eq. 46
electricity$lcof<-log(electricity$cost/electricity$fprice)
electricity$lo<-log(electricity$output)
electricity$llf<-log(electricity$lprice/electricity$fprice)
electricity$lcf<-log(electricity$cprice/electricity$fprice)
#Set to cost function
s=1
#Write formulae for parameters
mu_formula<-lcof ~ s(lo, bs="ps") + s(llf, bs="ps") + s(lcf, bs="ps") #non-linear effects
sigma_v_formula<-~1
sigma_u_formula<-~s(lo, bs="ps") + s(lshare, bs="ps") + s(cshare, bs="ps")
#Fit model with normhnorm dstribution
model_normhnorm<-mgcv::gam(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=electricity, family=comper(s=s, distr="normhnorm"),
optimizer = "efs")
#Summary of model
summary(model_normhnorm)
#Plot smooths
plot(model_normhnorm)