dc.BuildCBSFromCBTAndDates {BTYD}R Documentation

Build CBS matrix from CBT matrix


Given a customer-by-time matrix, yields the resulting customer-by-sufficient-statistic matrix.


dc.BuildCBSFromCBTAndDates(cbt, dates, per, cbt.is.during.cal.period = TRUE)



customer-by-time matrix. This is a matrix consisting of a row per customer and a column per time period. It should contain numeric information about a customer's transactions in every time period - either the number of transactions in that time period (frequency), a 1 to indicate that at least 1 transaction occurred (reach), or the average/total amount spent in that time period.


if cbt.is.during.cal.period is TRUE, then dates is a data frame with three columns: 1. the dates when customers made their first purchases 2. the dates when customers made their last purchases 3. the date of the end of the calibration period. if cbt.is.during.cal.period is FALSE, then dates is a vector with two elements: 1. the date of the beginning of the holdout period 2. the date of the end of the holdout period.


interval of time for customer-by-sufficient-statistic matrix. May be "day", "week", "month", "quarter", or "year".


if TRUE, indicates the customer-by-time matrix is from the calibration period. If FALSE, indicates the customer-by-time matrix is from the holdout period.


The customer-by-sufficient statistic matrix will contain the sum of the statistic included in the customer-by-time matrix (see the cbt parameter), the customer's last transaction date, and the total time period for which the customer was observed.


Customer-by-sufficient-statistic matrix, with three columns: frequency("x"), recency("t.x") and total time observed("T.cal"). See details. Frequency is total transactions, not repeat transactions.


elog <- dc.ReadLines(system.file("data/cdnowElog.csv", package="BTYD"),2,3,5)
elog[,"date"] <- as.Date(elog[,"date"], "%Y%m%d")

# Transaction-flow models are about interpurchase times. Since we
# only know purchase times to the day, we merge all transaction on
# the same day. This example uses dc.MergeTransactionsOnSameDate
# to illustrate this; however, we could have simply used dc.CreateReachCBT
# instead of dc.CreateFreqCBT to obtain the same result.
merged.elog <- dc.MergeTransactionsOnSameDate(elog)
cutoff.date <- as.Date("1997-09-30")
freq.cbt <- dc.CreateFreqCBT(merged.elog)
cal.freq.cbt <- freq.cbt[,as.Date(colnames(freq.cbt)) <= cutoff.date]
holdout.freq.cbt <- freq.cbt[,as.Date(colnames(freq.cbt)) > cutoff.date]

cal.start.dates.indices <- dc.GetFirstPurchasePeriodsFromCBT(cal.freq.cbt)
cal.start.dates <- as.Date(colnames(cal.freq.cbt)[cal.start.dates.indices])
cal.end.dates.indices <- dc.GetLastPurchasePeriodsFromCBT(cal.freq.cbt)
cal.end.dates <- as.Date(colnames(cal.freq.cbt)[cal.end.dates.indices])
T.cal.total <- rep(cutoff.date, nrow(cal.freq.cbt))
cal.dates <- data.frame(cal.start.dates, 

# Create calibration period customer-by-sufficient-statistic data frame,
# using weeks as the unit of time.
cal.cbs <- dc.BuildCBSFromCBTAndDates(cal.freq.cbt, 
# Force the calibration period customer-by-sufficient-statistic to only contain
# repeat transactions (required by BG/BB and Pareto/NBD models)
cal.cbs[,"x"] <- cal.cbs[,"x"] - 1

holdout.start <- cutoff.date+1
holdout.end <- as.Date(colnames(holdout.freq.cbt)[ncol(holdout.freq.cbt)])
holdout.dates <- c(holdout.start, holdout.end)

# Create holdout period customer-by-sufficient-statistic data frame, using weeks
# as the unit of time.
holdout.cbs <- dc.BuildCBSFromCBTAndDates(holdout.freq.cbt, 

[Package BTYD version 2.4.3 Index]