rdata {pricelevels} | R Documentation |
Simulate random price and quantity data
Description
Simulate random price and quantity data for a specified number of regions (r=1,\ldots,R)
, product groups (b=1,\ldots,B)
, and individual products (n=1,\ldots,N_{b})
using function rdata()
.
The sampling of prices relies on the NLCPD model (see nlcpd()
), while expenditure weights for product groups are sampled using function rweights()
. Purchased quantities are assigned to individual products. Moreover, random sales and gaps (using function rgaps()
) can be introduced in the sampled data.
Usage
rgaps(r, n, amount=0, prob=NULL, pairs=FALSE, exclude=NULL)
rweights(r, b, type=~1)
rdata(R, B, N, gaps=0, weights=~b+r, sales=0, settings=list())
Arguments
r , n , b |
A character vector or factor of regional entities |
R , B , N |
A single integer specifying the number of regions |
weights , type |
A formula specifying the sampling of expenditure weights for product groups. If |
gaps , sales , amount |
Percentage amount of gaps and sales (between 0 and 1), respectively, to be introduced in the data. |
prob |
A vector of probability weights, see also |
pairs |
A logical indicating if gaps should be introduced such that there are always at least two observations per product available ( |
exclude |
Data.frame of two (character) variables |
settings |
A list of control settings to be used. The following settings are supported:
|
Details
Function rgaps()
ensures that gaps do not lead to non-connected price data (see is.connected()
). Therefore, it could happen that the amount of gaps specified in rgaps()
is only approximate, in particular, in cases where certain regions and/or products should additionally be excluded from exhibiting gaps by exclude
.
If rgaps(pairs=FALSE)
, the minimum number of observations for a connected data set is R+N-1
. Otherwise, for rgaps(pairs=TRUE)
, this number is defined by 2N+\text{max}(0, R-N-1)
.
Note that setting sales>0
in function rdata()
distorts the initial price generating process. Consequently, parameter estimates may deviate stronger from their true values. Note also that the sampled expenditure weights weight
represent the relevance of product groups as (often) derived from national accounts and other data sources. Therefore, they cannot be derived from the sampled prices and quantities in the data, which would represent the expenditure shares of available products.
Value
Function rgaps()
returns a logical vector of the same length as r
where TRUE
s indicate gaps and FALSE
s no gaps.
Function rweights()
returns a numeric vector of (non-negative) expenditure share weights of the same length as r
.
Function rdata()
returns a data.table with the following variables:
group | product group identifier (factor) | |
weight | expenditure weight of product groups (numeric) | |
region | region identifier (factor) | |
product | product identifier (factor) | |
sale | are prices and quantities affected by sales (logical) | |
price | sampled price (numeric) | |
quantity | consumed quantity (numeric) | |
share | expenditure share weights (numeric) | |
or a list with the sampled data and its underlying parameter values, if settings=list(par.add=TRUE)
.
Author(s)
Sebastian Weinand
Examples
# sample price data for ten regions and five product groups
# containing three individual products each:
set.seed(1)
dt <- rdata(R=10, B=5, N=3)
boxplot(price~paste(group, product, sep=":"), data=dt)
# sample price data for ten regions and five product groups
# containing one to five individual products:
set.seed(1)
dt <- rdata(R=10, B=5, N=c(1,2,3,4,5))
boxplot(price~paste(group, product, sep=":"), data=dt)
# sample price data for three product groups (with one product each) in four regions:
dt <- rdata(R=4, B=3, N=1)
# add expenditure share weights:
dt[, "w1" := rweights(r=region, b=group, type=~1)] # constant
dt[, "w2" := rweights(r=region, b=group, type=~b)] # product-specific
dt[, "w3" := rweights(r=region, b=group, type=~b+r)] # product-region-specific
# weights add up to 1:
dt[, list("w1"=sum(w1),"w2"=sum(w2),"w3"=sum(w3)), by="region"]
# introduce 25% random gaps:
dt.gaps <- dt[!rgaps(r=region, n=product, amount=0.25), ]
# weights no longer add up to 1 in each region:
dt.gaps[, list("w1"=sum(w1),"w2"=sum(w2),"w3"=sum(w3)), by="region"]
# approx. 25% random gaps, but keep observation for product "n2"
# in region "r1" and all observations in region "r2":
no_gaps <- data.frame(r=c("r1","r2"), n=c("n2",NA))
# apply to data:
dt[!rgaps(r=region, n=product, amount=0.25, exclude=no_gaps), ]
# or, directly, in one step:
dt <- rdata(R=4, B=3, N=1, gaps=0.25, settings=list("gaps.exclude"=no_gaps))
# introduce systematic gaps:
dt <- rdata(R=15, B=1, N=10)
dt[, "prob" := data.table::rleidv(product)] # probability for gaps increases per product
dt.gaps <- dt[!rgaps(r=region, n=product, amount=0.25, prob=prob), ]
plot(table(dt.gaps$product), type="l")