R: For each market, find the best matching control market

best_matches {MarketMatching}

R Documentation

For each market, find the best matching control market

Description

best_matches finds the best matching control markets for each market in the dataset using dynamic time warping (dtw package). The algorithm simply loops through all viable candidates for each market in a parallel fashion, and then ranks by distance and/or correlation.

Usage

best_matches(data=NULL,
             markets_to_be_matched=NULL,
             id_variable=NULL,
             date_variable=NULL,
             matching_variable=NULL,
             parallel=TRUE,
             warping_limit=1,
             start_match_period=NULL,
             end_match_period=NULL,
             matches=NULL,
             dtw_emphasis=1, 
             suggest_market_splits=FALSE,
             splitbins=10,
             log_for_splitting=FALSE)

Arguments

`data`	input data.frame for analysis. The dataset should be structured as "stacked" time series (i.e., a panel dataset). In other words, markets are rows and not columns – we have a unique row for each area/time combination.
`markets_to_be_matched`	Use this parameter if you only want to get control matches for a subset of markets or a single market The default is NULL which means that all markets will be paired with matching markets
`id_variable`	the name of the variable that identifies the markets
`date_variable`	the time stamp variable
`matching_variable`	the variable (metric) used to match the markets. For example, this could be sales or new customers
`parallel`	set to TRUE for parallel processing. Default is TRUE
`warping_limit`	the warping limit used for matching. Default is 1, which means that a single query value can be mapped to at most 2 reference values.
`start_match_period`	the start date of the matching period (pre period). Must be a character of format "YYYY-MM-DD" – e.g., "2015-01-01"
`end_match_period`	the end date of the matching period (pre period). Must be a character of format "YYYY-MM-DD" – e.g., "2015-10-01"
`matches`	Number of matching markets to keep in the output (to use less markets for inference, use the control_matches parameter when calling inference). Default is to keep all matches.
`dtw_emphasis`	Number from 0 to 1. The amount of emphasis placed on dtw distances, versus correlation, when ranking markets. Default is 1 (all emphasis on dtw). If emphasis is set to 0, all emphasis would be put on correlation, which is recommended when optimal splits are requested. An emphasis of 0.5 would yield equal weighting.
`suggest_market_splits`	if set to TRUE, best_matches will return suggested test/control splits based on correlation and market sizes. Default is FALSE. For this option to be invoked, markets_to_be_matched must be NULL (i.e., you must run a full match). Note that the algorithm will force test and control to have the same number of markets. So if the total number of markets is odd, one market will be left out.
`splitbins`	Number of size-based bins used to stratify when splitting markets into test and control. Only markets inside the same bin can be matched. More bins means more emphasis on market size when splitting. Less bins means more emphasis on correlation. Default is 10.
`log_for_splitting`	This parameter determines if optimal splitting is based on correlations of the raw matching metric values or the correlations of log(matching metric). Only relevant if suggest_market_splits is TRUE. Default is FALSE.

Value

Returns an object of type market_matching. The object has the following elements:

`BestMatches`	A data.frame that contains the best matches for each market. All stats reflect data after the market pairs have been joined by date. Thus SUMTEST and SUMCNTL can have smaller values than what you see in the Bins output table
`Data`	The raw data used to do the matching
`MarketID`	The name of the market identifier
`MatchingMetric`	The name of the matching variable
`DateVariable`	The name of the date variable
`SuggestedTestControlSplits`	Suggested test/control splits. SUMTEST and SUMCNTL are the total market volumes, not volume after joining with other markets. They're greater or equal to the values in the BestMatches file.
`Bins`	Bins used for splitting and corresponding volumes

Examples

## Not run: 
##-----------------------------------------------------------------------
## Find the best matches for the CPH airport time series
##-----------------------------------------------------------------------
library(MarketMatching)
data(weather, package="MarketMatching")
mm <- best_matches(data=weather, 
                   id="Area",
                   markets_to_be_matched=c("CPH", "SFO"),
                   date_variable="Date",
                   matching_variable="Mean_TemperatureF",
                   parallel=FALSE,
                   start_match_period="2014-01-01",
                   end_match_period="2014-10-01")
head(mm$BestMatches)

## End(Not run)

[Package MarketMatching version 1.2.1 Index]