data_reducing {PriceIndices}R Documentation

Reducing products

Description

The function returns a reduced data set, i.e. a data set containing sufficiently numerous matched products in the indicated groups. The input data set (data frame) must contain matched products over time, i.e. it must contain the prodID column (as numeric, factor or character), or product descriptions, i.e. it must contain the description column (as character).

Usage

data_reducing(
  data,
  start,
  end,
  type = "prodID",
  minN = 2,
  outlets = FALSE,
  by = c(),
  interval = FALSE
)

Arguments

data

The user's data frame with information about sold products. It must contain columns: time (as Date in format: year-month-day,e.g. '2020-12-01') and, depending on next parameter values, columns: prodID or description, and retID.

start

The base period (as character) limited to the year and month, e.g. "2020-03".

end

The research period (as character) limited to the year and month, e.g. "2020-04".

type

This parameter indicates whether group counts are determined by different matched prodIDs (in which case the parameter has the value 'prodID') or different matched descriptions (in which case the parameter has the value 'description').

minN

This parameter determines the minimum size of matched products in groups.

outlets

This parameter determines whether grouping is to be done for each outlet separately. If so (if it is TRUE), the data set must contain a column identifying the outlets (retID).

by

This parameter specifies the name of the grouping column (as character).

interval

A logical value indicating whether the reducing process concerns only two periods defined by start and end parameters (then the interval is set to FALSE) or whether that function is to reduce products sold during the whole time interval <start, end>.

Value

The function returns a reduced data set, i.e. a data set containing sufficiently numerous matched products in the indicated groups. For each product group created and for selected periods, the procedure checks that the count of identical prodIDs (or identical product descriptions, which does not necessarily mean the same thing) is at least equal to minN. If it is not, such products are eliminated from the data set. The function performs the check either only for the base and current period (in which case the interval parameter is FALSE) or also for all intermediate months (in which case the interval parameter is TRUE). If the user wants to perform this check for each outlet separately, then the outlets parameter should be set to TRUE.

Examples

data_reducing(sugar, start="2018-12", end="2019-12",by="description", minN=5)


[Package PriceIndices version 0.1.9 Index]