data_reducing {PriceIndices} | R Documentation |
Reducing products
Description
The function returns a reduced data set, i.e. a data set containing sufficiently numerous matched products in the indicated groups. The input data set (data frame) must contain matched products over time, i.e. it must contain the prodID
column (as numeric, factor or character), or product descriptions, i.e. it must contain the description
column (as character).
Usage
data_reducing(
data,
start,
end,
type = "prodID",
minN = 2,
outlets = FALSE,
by = c(),
interval = FALSE
)
Arguments
data |
The user's data frame with information about sold products. It must contain columns: |
start |
The base period (as character) limited to the year and month, e.g. "2020-03". |
end |
The research period (as character) limited to the year and month, e.g. "2020-04". |
type |
This parameter indicates whether group counts are determined by different matched prodIDs (in which case the parameter has the value 'prodID') or different matched descriptions (in which case the parameter has the value 'description'). |
minN |
This parameter determines the minimum size of matched products in groups. |
outlets |
This parameter determines whether grouping is to be done for each outlet separately. If so (if it is |
by |
This parameter specifies the name of the grouping column (as character). |
interval |
A logical value indicating whether the reducing process concerns only two periods defined by |
Value
The function returns a reduced data set, i.e. a data set containing sufficiently numerous matched products in the indicated groups. For each product group created and for selected periods, the procedure checks that the count of identical prodIDs (or identical product descriptions, which does not necessarily mean the same thing) is at least equal to minN
. If it is not, such products are eliminated from the data set. The function performs the check either only for the base and current period (in which case the interval
parameter is FALSE) or also for all intermediate months (in which case the interval
parameter is TRUE). If the user wants to perform this check for each outlet separately, then the outlets
parameter should be set to TRUE.
Examples
data_reducing(sugar, start="2018-12", end="2019-12",by="description", minN=5)