dsfa {dsfa} R Documentation

## dsfa-package: Distributional Stochastic Frontier Analysis

### Description

The dsfa package implements the specification, estimation and prediction of distributional stochastic frontier models via mgcv. The basic distributional stochastic frontier model is given by:

Y_n = \eta^\mu(\boldsymbol{x}_n^\mu) + V_n + s \cdot U_n

where n \in \{1,2,...,N\}. V_n and U_n are the noise and (in)efficiency respectively.

• For s=-1, \eta^\mu(\cdot) is the production function and \boldsymbol{x}_n^\mu are the log inputs. Alternatively, if s=1, \eta^\mu(\cdot) is the cost function and \boldsymbol{x}_n^\mu are the log cost. The vector \boldsymbol{x}_n^\mu may also contain other variables.

• The noise is represented as V_n \sim N(0,\sigma_{Vn}^2), where \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}})). Here, \boldsymbol{x}_n^{\sigma_{V}} are the observed covariates which influence the parameter of the noise.

• The (in)efficiency can be represented in two ways.

• If U_n \sim HN(\sigma_{Un}^2), where \sigma_{Un}=\exp(\eta^{\sigma_{Un}}(\boldsymbol{x}_n^{\sigma_{U}})). Here, \boldsymbol{x}_n^{\sigma_{U}} are the observed covariates which influence the parameter of the (in)efficiency. Consequently:

Y_n \sim normhnorm(\mu_n=\eta^\mu(\boldsymbol{x}_n^\mu), \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}})), \sigma_{Un}=\exp(\eta^{\sigma_{U}}(\boldsymbol{x}_n^{\sigma_{U}})), s=s)

. For more details see dnormhnorm.

• If U_n \sim Exp(\sigma_{Un}), where \sigma_{Un}=\exp(\eta^{\sigma_{Un}}(\boldsymbol{x}_n^{\sigma_{U}})). Here, \boldsymbol{x}_n^{\sigma_{U}} are the observed covariates which influence the parameter of the (in)efficiency. Consequently:

Y_n \sim normexp(\mu_n=\eta^\mu(\boldsymbol{x}_n^\mu), \sigma_{Vn}=\exp(\eta^{\sigma_{V}}(\boldsymbol{x}_n^{\sigma_{V}})), \sigma_{Un}=\exp(\eta^{\sigma_{U}}(\boldsymbol{x}_n^{\sigma_{U}})), s=s)

. For more details see dnormexp.

Consequently, Y_n follows a composed-error distribution. For an overview see dcomper.

Let \theta_n be a parameter of the distribution of Y_n, e.g. \theta_n \in \{\mu_n, \sigma_{Un}, \sigma_{Vn}\}. Further, let g^{-1}_{\theta}(\cdot) be the monotonic response function, which links the additive predictor \eta(\boldsymbol{x}_n^\theta) to the parameter space for the parameter \theta_n via the additive model:

g^{-1}_{\theta}(\theta_n)=\eta(\boldsymbol{x}_n^\theta)=\beta^\theta_0 + \sum_{j^\theta=1}^{J^\theta} h^\theta_{j^\theta}(x^\theta_{nj^\theta})

Thus, the additive predictor \eta(\boldsymbol{x}_n^\theta) is made up by the intercept \beta^\theta_0 and J^\theta smooths terms. The mgcv packages provides a framework for fitting distributional regression models. For more information see comper. The additive predictors can be defined via formulae in gam. Within the formulae for the parameter \theta_n, the smooth function for the variable x^\theta_{nj^\theta} can be specified via the function s, which is h^\theta_{j^\theta}(\cdot) in the notation above. The smooth functions may be:

• linear effects, may include polynomials or regression splines.

• non-linear effects, which can be modeled via penalized regression splines, e.g. p.spline, tprs.

• random effects, random.effects.

• spatial effects, which can be modeled via mrf.

An overview is provided at smooth.terms. The functions gam, predict.gam and plot.gam, are alike to the basic S functions. A number of other functions such as summary.gam, residuals.gam and anova.gam are also provided, for extracting information from a fitted gamOject.

The main functions are:

• comper Object which can be used to fit a composed-error stochastic frontier model with the mgcv package.

• comper_mv Object which can be used to fit a multivariate composed-error stochastic frontier model with the mgcv package.

• elasticity Calculates and plots the elasticity of a smooth function.

• efficiency Calculates the expected technical (in)efficiency index E[U|\mathcal{E}] or E[\exp(-U)|\mathcal{E}].

Further useful functions are:

• dcomper Probablitiy density function, distribution, quantile function and random number generation for the composed-error distribution.

• dcomper_mv Probablitiy density function, distribution, quantile function and random number generation for the multivariate composed-error distribution.

• dcop Probablitiy density function, distribution and random number generation for copulas.

These are written in C++ for fast and accurate evaluation including derivatives. They may be helpful for other researchers, who want to avoid the tedious implementation. Additionally:

• cop Object which can be used to fit a copula with the mgcv package.

### Usage

dsfa(
formula,
family = comper(link = list("identity", "logshift", "logshift"), s = -1, distr =
"normhnorm"),
data = list(),
optimizer = "efs",
...
)


### Arguments

 formula A list of formulas specifying the additive predictors. See formula.gam and gam.models for more details. family The family object specifies the (multivariate) composed-error distribution and link of the model. See comper and comper_mv for more details. data A data frame or list containing the model response variable and covariates required by the formula. By default the variables are taken from environment(formula): typically the environment from which dsfa is called. optimizer An array specifying the numerical optimization method to use to optimize the smoothing parameter estimation criterion (given by method). "outer" for the more stable direct approach. "outer" can use several alternative optimizers, specified in the second element of optimizer: "newton" (default), "bfgs", "optim", "nlm" and "nlm.fd" (the latter is based entirely on finite differenced derivatives and is very slow). "efs" for the extended Fellner Schall method of Wood and Fasiolo (2017). ... other parameters of gam

### Details

This function is a wrapper for gam.

### Value

Returns a gam object.

### References

• Schmidt R, Kneib T (2023). “Multivariate distributional stochastic frontier models.” Computational Statistics & Data Analysis, 107796.

• Wood SN, Fasiolo M (2017). “A generalized Fellner-Schall method for smoothing parameter optimization with application to Tweedie location, scale and shape models.” Biometrics, 73(4), 1071–1081.

• Kumbhakar SC, Wang H, Horncastle AP (2015). A practitioner's guide to stochastic frontier analysis using Stata. Cambridge University Press.

• Schmidt R, Kneib T (2020). “Analytic expressions for the Cumulative Distribution Function of the Composed Error Term in Stochastic Frontier Analysis with Truncated Normal and Exponential Inefficiencies.” arXiv preprint arXiv:2006.03459.

### Examples


### First example with simulated data
#Set seed, sample size and type of function
set.seed(1337)
N=500 #Sample size
s=-1 #Set to production function

#Generate covariates
x1<-runif(N,-1,1); x2<-runif(N,-1,1); x3<-runif(N,-1,1)
x4<-runif(N,-1,1); x5<-runif(N,-1,1)

#Set parameters of the distribution
mu=2+0.75*x1+0.4*x2+0.6*x2^2+6*log(x3+2)^(1/4) #production function parameter
sigma_v=exp(-1.5+0.75*x4) #noise parameter
sigma_u=exp(-1+sin(2*pi*x5)) #inefficiency parameter

#Simulate responses and create dataset
y<-rcomper(n=N, mu=mu, sigma_v=sigma_v, sigma_u=sigma_u, s=s, distr="normhnorm")
dat<-data.frame(y, x1, x2, x3, x4, x5)

#Write formulae for parameters
mu_formula<-y~x1+x2+I(x2^2)+s(x3, bs="ps")
sigma_v_formula<-~1+x4
sigma_u_formula<-~1+s(x5, bs="ps")

#Fit model
model<-dsfa(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=dat, family=comper(s=s, distr="normhnorm"), optimizer = c("efs"))

#Model summary
summary(model)

#Smooth effects
#Effect of x3 on the predictor of the production function
plot(model, select=1) #Estimated function
lines(x3[order(x3)], 6*log(x3[order(x3)]+2)^(1/4)-
mean(6*log(x3[order(x3)]+2)^(1/4)), col=2) #True effect

#Effect of x5 on the predictor of the inefficiency
plot(model, select=2) #Estimated function
lines(x5[order(x5)], -1+sin(2*pi*x5)[order(x5)]-
mean(-1+sin(2*pi*x5)),col=2) #True effect

### Second example with real data of production function

data("RiceFarms", package = "plm") #load data
RiceFarms[,c("goutput","size","seed", "totlabor", "urea")]<-
log(RiceFarms[,c("goutput","size","seed", "totlabor", "urea")]) #log outputs and inputs
RiceFarms$id<-factor(RiceFarms$id) #id as factor

#Set to production function
s=-1

#Write formulae for parameters
mu_formula<-goutput ~  s(size, bs="ps") + s(seed, bs="ps") + #non-linear effects
s(totlabor, bs="ps") + s(urea, bs="ps") + #non-linear effects
varieties + #factor
s(id, bs="re") #random effect
sigma_v_formula<-~1
sigma_u_formula<-~bimas

#Fit model with normhnorm dstribution
model<-dsfa(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=RiceFarms, family=comper(s=s, distr="normhnorm"), optimizer = "efs")

#Summary of model
summary(model)

#Plot smooths
plot(model)

### Third example with real data of cost function

data("electricity", package = "sfaR") #load data

electricity$lcof<-log(electricity$cost/electricity$fprice) electricity$lo<-log(electricity$output) electricity$llf<-log(electricity$lprice/electricity$fprice)
electricity$lcf<-log(electricity$cprice/electricity\$fprice)

#Set to cost function
s=1

#Write formulae for parameters
mu_formula<-lcof ~ s(lo, bs="ps") + s(llf, bs="ps") + s(lcf, bs="ps") #non-linear effects
sigma_v_formula<-~1
sigma_u_formula<-~s(lo, bs="ps") + s(lshare, bs="ps") + s(cshare, bs="ps")

#Fit model with normhnorm dstribution
model<-dsfa(formula=list(mu_formula, sigma_v_formula, sigma_u_formula),
data=electricity, family=comper(s=s, distr="normhnorm"),
optimizer = "efs")

#Summary of model
summary(model)

#Plot smooths
plot(model)



[Package dsfa version 2.0.2 Index]