dsfa {dsfa} | R Documentation |
The dsfa package implements the specification, estimation and prediction of distributional stochastic frontier models via mgcv. The basic distributional stochastic frontier model is given by:
Y_n = \eta^\mu(\boldsymbol{x}_n^\mu) + V_n + s \cdot U_n
where n \in \{1,2,...,N\}
.
V_n
and U_n
are the noise and (in)efficiency respectively.
For s=-1
, \eta^\mu(\cdot)
is the production function and \boldsymbol{x}_n^\mu
are the log inputs.
Alternatively, if s=1
, \eta^\mu(\cdot)
is the cost function and \boldsymbol{x}_n^\mu
are the log cost.
The vector \boldsymbol{x}_n^\mu
may also contain other variables.
The noise is represented as V_n \sim N(0,\sigma_{Vn}^2)
, where \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}}))
.
Here, \boldsymbol{x}_n^{\sigma_{V}}
are the observed covariates which influence the parameter of the noise.
The (in)efficiency can be represented in two ways.
If U_n \sim HN(\sigma_{Un}^2)
, where \sigma_{Un}=\exp(\eta^{\sigma_{Un}}(\boldsymbol{x}_n^{\sigma_{U}}))
.
Here, \boldsymbol{x}_n^{\sigma_{U}}
are the observed covariates which influence the parameter of the (in)efficiency.
Consequently:
Y_n \sim normhnorm(\mu_n=\eta^\mu(\boldsymbol{x}_n^\mu), \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}})), \sigma_{Un}=\exp(\eta^{\sigma_{U}}(\boldsymbol{x}_n^{\sigma_{U}})), s=s)
.
For more details see dnormhnorm
.
If U_n \sim Exp(\sigma_{Un})
, where \sigma_{Un}=\exp(\eta^{\sigma_{Un}}(\boldsymbol{x}_n^{\sigma_{U}}))
.
Here, \boldsymbol{x}_n^{\sigma_{U}}
are the observed covariates which influence the parameter of the (in)efficiency.
Consequently:
Y_n \sim normexp(\mu_n=\eta^\mu(\boldsymbol{x}_n^\mu), \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}})), \sigma_{Un}=\exp(\eta^{\sigma_{U}}(\boldsymbol{x}_n^{\sigma_{U}})), s=s)
.
For more details see dnormexp
.
Consequently, Y_n
follows a composed-error distribution. For an overview see dcomper
.
Let \theta_n
be a parameter of the distribution of Y_n
, e.g. \theta_n \in \{\mu_n, \sigma_{Un}, \sigma_{Vn}\}
.
Further, let g^{-1}_{\theta}(\cdot)
be the monotonic response function, which links the additive predictor \eta(\boldsymbol{x}_n^\theta)
to the parameter space for the parameter \theta_n
via the additive model:
g^{-1}_{\theta}(\theta_n)=\eta(\boldsymbol{x}_n^\theta)=\beta^\theta_0 + \sum_{j^\theta=1}^{J^\theta} h^\theta_{j^\theta}(x^\theta_{nj^\theta})
Thus, the additive predictor \eta(\boldsymbol{x}_n^\theta)
is made up by the intercept \beta^\theta_0
and J^\theta
smooths terms.
The mgcv packages provides a framework for fitting distributional regression models. For more information see comper
.
The additive predictors can be defined via formulae in gam
. Within the formulae for the parameter \theta_n
, the smooth function for the variable x^\theta_{nj^\theta}
can be specified via the function s
, which is h^\theta_{j^\theta}(\cdot)
in the notation above.
The smooth functions may be:
linear effects, may include polynomials or regression splines.
non-linear effects, which can be modeled via penalized regression splines, e.g. p.spline
, tprs
.
random effects, random.effects
.
spatial effects, which can be modeled via mrf
.
An overview is provided at smooth.terms
.
The functions gam
, predict.gam
and plot.gam
,
are alike to the basic S
functions.
A number of other functions such as summary.gam
, residuals.gam
and
anova.gam
are also provided, for extracting information from a fitted gamOject
.
The main functions are:
comper
Object which can be used to fit a composed-error stochastic frontier model with the mgcv
package.
comper_mv
Object which can be used to fit a multivariate composed-error stochastic frontier model with the mgcv
package.
elasticity
Calculates and plots the elasticity of a smooth function.
efficiency
Calculates the expected technical (in)efficiency index E[U|\mathcal{E}]
or E[\exp(-U)|\mathcal{E}]
.
Further useful functions are:
dcomper
Probablitiy density function, distribution, quantile function and random number generation for the composed-error distribution.
dcomper_mv
Probablitiy density function, distribution, quantile function and random number generation for the multivariate composed-error distribution.
dcop
Probablitiy density function, distribution and random number generation for copulas.
These are written in C++
for fast and accurate evaluation including derivatives. They may be helpful for other researchers, who want to avoid the tedious implementation.
Additionally:
cop
Object which can be used to fit a copula with the mgcv package.
dsfa(
formula,
family = comper(link = list("identity", "logshift", "logshift"), s = -1, distr =
"normhnorm"),
data = list(),
optimizer = "efs",
...
)
formula |
A list of formulas specifying the additive predictors. See |
family |
The family object specifies the (multivariate) composed-error distribution and link of the model. See |
data |
A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which |
optimizer |
An array specifying the numerical optimization method to use to optimize the smoothing parameter estimation criterion (given by method). "outer" for the more stable direct approach. "outer" can use several alternative optimizers, specified in the second element of optimizer: "newton" (default), "bfgs", "optim", "nlm" and "nlm.fd" (the latter is based entirely on finite differenced derivatives and is very slow). "efs" for the extended Fellner Schall method of Wood and Fasiolo (2017). |
... |
other parameters of |
This function is a wrapper for gam
.
Returns a gam
object.
Rouven Schmidt rouven.schmidt@tu-clausthal.de
Schmidt R, Kneib T (2023). “Multivariate distributional stochastic frontier models.” Computational Statistics & Data Analysis, 107796.
Wood SN, Fasiolo M (2017). “A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models.” Biometrics, 73(4), 1071–1081.
Kumbhakar SC, Wang H, Horncastle AP (2015). A practitioner's guide to stochastic frontier analysis using Stata. Cambridge University Press.
Schmidt R, Kneib T (2020). “Analytic expressions for the Cumulative Distribution Function of the Composed Error Term in Stochastic Frontier Analysis with Truncated Normal and Exponential Inefficiencies.” arXiv preprint arXiv:2006.03459.
### First example with simulated data
#Set seed, sample size and type of function
set.seed(1337)
N=500 #Sample size
s=-1 #Set to production function
#Generate covariates
x1<-runif(N,-1,1); x2<-runif(N,-1,1); x3<-runif(N,-1,1)
x4<-runif(N,-1,1); x5<-runif(N,-1,1)
#Set parameters of the distribution
mu=2+0.75*x1+0.4*x2+0.6*x2^2+6*log(x3+2)^(1/4) #production function parameter
sigma_v=exp(-1.5+0.75*x4) #noise parameter
sigma_u=exp(-1+sin(2*pi*x5)) #inefficiency parameter
#Simulate responses and create dataset
y<-rcomper(n=N, mu=mu, sigma_v=sigma_v, sigma_u=sigma_u, s=s, distr="normhnorm")
dat<-data.frame(y, x1, x2, x3, x4, x5)
#Write formulae for parameters
mu_formula<-y~x1+x2+I(x2^2)+s(x3, bs="ps")
sigma_v_formula<-~1+x4
sigma_u_formula<-~1+s(x5, bs="ps")
#Fit model
model<-dsfa(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=dat, family=comper(s=s, distr="normhnorm"), optimizer = c("efs"))
#Model summary
summary(model)
#Smooth effects
#Effect of x3 on the predictor of the production function
plot(model, select=1) #Estimated function
lines(x3[order(x3)], 6*log(x3[order(x3)]+2)^(1/4)-
mean(6*log(x3[order(x3)]+2)^(1/4)), col=2) #True effect
#Effect of x5 on the predictor of the inefficiency
plot(model, select=2) #Estimated function
lines(x5[order(x5)], -1+sin(2*pi*x5)[order(x5)]-
mean(-1+sin(2*pi*x5)),col=2) #True effect
### Second example with real data of production function
data("RiceFarms", package = "plm") #load data
RiceFarms[,c("goutput","size","seed", "totlabor", "urea")]<-
log(RiceFarms[,c("goutput","size","seed", "totlabor", "urea")]) #log outputs and inputs
RiceFarms$id<-factor(RiceFarms$id) #id as factor
#Set to production function
s=-1
#Write formulae for parameters
mu_formula<-goutput ~ s(size, bs="ps") + s(seed, bs="ps") + #non-linear effects
s(totlabor, bs="ps") + s(urea, bs="ps") + #non-linear effects
varieties + #factor
s(id, bs="re") #random effect
sigma_v_formula<-~1
sigma_u_formula<-~bimas
#Fit model with normhnorm dstribution
model<-dsfa(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=RiceFarms, family=comper(s=s, distr="normhnorm"), optimizer = "efs")
#Summary of model
summary(model)
#Plot smooths
plot(model)
### Third example with real data of cost function
data("electricity", package = "sfaR") #load data
#Log inputs and outputs as in Greene 1990 eq. 46
electricity$lcof<-log(electricity$cost/electricity$fprice)
electricity$lo<-log(electricity$output)
electricity$llf<-log(electricity$lprice/electricity$fprice)
electricity$lcf<-log(electricity$cprice/electricity$fprice)
#Set to cost function
s=1
#Write formulae for parameters
mu_formula<-lcof ~ s(lo, bs="ps") + s(llf, bs="ps") + s(lcf, bs="ps") #non-linear effects
sigma_v_formula<-~1
sigma_u_formula<-~s(lo, bs="ps") + s(lshare, bs="ps") + s(cshare, bs="ps")
#Fit model with normhnorm dstribution
model<-dsfa(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=electricity, family=comper(s=s, distr="normhnorm"),
optimizer = "efs")
#Summary of model
summary(model)
#Plot smooths
plot(model)