R: CPD and NLCPD methods

cpd {pricelevels}

R Documentation

CPD and NLCPD methods

Description

Function cpd() estimates regional price levels by the Country-Product-Dummy (CPD) method, originally developed by Summers (1973). Auer and Weinand (2022) recently proposed a generalization of the CPD method. This nonlinear CPD method (NLCPD method) is implemented in function nlcpd().

Usage

cpd(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list())

nlcpd(p, r, n, q=NULL, w=NULL, base=NULL, simplify=TRUE, settings=list(), ...)

Arguments

`p`	A numeric vector of prices.
`r`, `n`	A character vector or factor of regional entities `r` and products `n`, respectively.
`q`, `w`	A numeric vector of non-negative quantities `q` or weights `w`. By default, no weights are used in the regression (`q=NULL` and `w=NULL`). While `w` can be any weights considered as appropriate for weighted regression, `q` will result in an expenditure share weighted regression (see details). If both `q` and `w` are provided, `q` will be used.
`base`	A character specifying the base to which the estimated logarithmic regional price levels are expressed. When `NULL`, they refer to the (unweighted) regional average, similar to `contr.sum`.
`simplify`	A logical indicating whether the full regression-object should be provided (`FALSE`) or a named vector of estimated regional price levels (`TRUE`).
`settings`	A list of control settings to be used. The following settings are supported: `chatty` : A logical specifying if warnings and info messages should be printed or not. The default is `getOption("pricelevels.chatty")`. `connect` : A logical specifying if the data should be checked for connectedness or not. The default is `getOption("pricelevels.connect")`. If the data are not connected, price levels are computed within the biggest block of connected regions or the block of regions to which the `base` region belongs. See also `connect()`. `norm.weights` : A logical specifying if the weights `w` should be renormalized such that they add up to 1 for each region `r` or not. The default is `TRUE`. `plot` : A logical specifying if the calculated price levels should be plotted or not. If `TRUE`, the price ratios of each region are displayed as boxplots and the price levels are added as colored points. The default is `getOption("pricelevels.plot")`. `self.start` : Only if `par=NULL`, the strategy how parameter start values are internally derived by `nlcpd()`. Currently, values `s1`, `s2` and `s3` are allowed. For `s1`, simple price averages across products and regions are used as start values, while these are derived by the CPD method for strategies `s2` and `s3`. Start values for `delta` are either set to 1 or derived by their first-order condition if `s3`. By default, `self.start='s1'`. `use.jac` : A logical indicating if the jacobian matrix should be used by `nlcpd()` for the nonlinear optimization or not. The default is `FALSE`. `w.delta` : A named vector of weights for the `delta`-parameter (see Details). Vector length must be equal to the number of products, while names must match product names. If not supplied, `\delta_i` weights are derived internally by `nlcpd()` from the weights `w`.
`...`	Further arguments passed to `nls.lm`, typically arguments `control`, `par`, `upper`, and `lower`. For `par`, `upper`, and `lower`, vectors must have names for each parameter separated by a dot, e.g., `lnP.1`, `pi.2`, or `delta.3`.

Details

The CPD method is a linear regression model that explains the logarithmic price of product i in region r, \ln p_i^r, by the general product price, \ln \pi_i, and the overall price level, \ln P^r:

\ln p_i^r = \ln \pi_i + \ln P^r + u_i^r

The NLCPD method inflates the CPD model by product-specific elasticities \delta_i:

\ln p_i^r = \ln \pi_i + \delta_i \ln P^r + u_i^r

Note that both the CPD and the NLCPD method require a normalization of the estimated price levels \widehat{\ln P^r} to avoid multicollinearity. If base=NULL, normalization \sum_{r=1}^{R} \widehat{\ln P^r}=0 is used in both functions; otherwise, one price level is set to 0. The NLCPD method additionally imposes the restriction \sum_{i=1}^{N} w_i \widehat{\delta_i}=1, where the weights w_i can be defined by settings$w.delta. In nlcpd(), it is always the parameter \widehat{\delta_1} that is derived residually from this restriction.

Before calculations start, missing values are excluded and duplicated observations for r and n are aggregated, that is, duplicated prices p and weights w are averaged and duplicated quantities q added up.

If q is provided, expenditure shares are derived as w_i^r = p_i^r q_i^r / \sum_{j=1}^{N} p_j^r q_j^r and used as weights in the regression. If only w is provided, the weights w are (re-)normalized by default. If the weights w do not represent expenditure shares, the (re-)normalization can be turned off by settings=list(norm.weights=FALSE).

Value

For simplify=TRUE, a named vector of (unlogged) regional price levels. Otherwise, for cpd(), a lm-object containing the full regression output, and for nlcpd() the full output of nls.lm() plus element w.delta.

Author(s)

Sebastian Weinand

References

Auer, L. v. and Weinand, S. (2022). A Nonlinear Generalization of the Country-Product- Dummy Method. Discussion Paper 2022/45, Deutsche Bundesbank.

Summers, R. (1973). International Price Comparisons based upon Incomplete Data. Review of Income and Wealth, 19 (1), 1-16.

Examples

# sample complete price data:
set.seed(123)
R <- 3 # number of regions
B <- 1 # number of product groups
N <- 5 # number of products
dt1 <- rdata(R=R, B=B, N=N)

# compute expenditure share weighted cpd and nlcpd index:
dt1[, cpd(p=price, r=region, n=product, q=quantity)]
dt1[, nlcpd(p=price, r=region, n=product, q=quantity)]

# set individual start values in nlcpd():
par.init <- list("lnP"=setNames(rep(0, R), 1:R),
                 "pi"=setNames(rep(2, N), 1:N),
                 "delta"=setNames(rep(1, N), 1:N))
dt1[, nlcpd(p=price, r=region, n=product, q=quantity, par=par.init)]

# use lower and upper bounds on parameters:
dt1[, nlcpd(p=price, r=region, n=product, q=quantity,
            lower=unlist(par.init)-0.1, upper=unlist(par.init)+0.1)]

# change internal calculation of start values:
dt1[, nlcpd(p=price, r=region, n=product, q=quantity, settings=list(self.start="s2"))]

# add price data:
dt2 <- rdata(R=4, B=1, N=4)
dt2[, "region":=factor(region, labels=4:7)]
dt2[, "product":=factor(product, labels=6:9)]
dt <- rbind(dt1, dt2)
dt[, is.connected(r=region, n=product)] # non-connected now

# compute expenditure share weighted cpd and nlcpd index:
dt[, cpd(p=price, r=region, n=product, q=quantity, base="1")]
dt[, nlcpd(p=price, r=region, n=product, q=quantity, base="1")]

# compare with toernqvist index:
dt[, toernqvist(p=price, r=region, n=product, q=quantity, base="1")]


# computational speed in nlcpd() usually increases if use.jac=TRUE:
set.seed(123)
dt3 <- rdata(R=20, B=1, N=30)
system.time(m1 <- dt3[, nlcpd(p=price, r=region, n=product, q=quantity,
                              settings=list(use.jac=FALSE), simplify=FALSE,
                              control=minpack.lm::nls.lm.control("maxiter"=200))])
system.time(m2 <- dt3[, nlcpd(p=price, r=region, n=product, q=quantity,
                              settings=list(use.jac=TRUE), simplify=FALSE,
                              control=minpack.lm::nls.lm.control("maxiter"=200))])
all.equal(m1$par, m2$par, tol=1e-05)

[Package pricelevels version 1.3.0 Index]