clvdata {CLVTools}R Documentation

Create an object for transactional data required to estimate CLV


Creates a data object that contains the prepared transaction data and that is used as input for model fitting. The transaction data may be split in an estimation and holdout sample if desired. The model then will only be fit on the estimation sample.

If covariates should be used when fitting a model, covariate data can be added to an object returned from this function.


  estimation.split = NULL, = "Id", = "Date",
  name.price = "Price"



Transaction data as data.frame or data.table. See details.


Character string that indicates the format of the date variable in the data used. See details.


What time unit defines a period. May be abbreviated, capitalization is ignored. See details.


Indicates the length of the estimation period. See details.

Column name of the customer id in data.transactions.

Column name of the transaction date in data.transactions.


Column name of price in data.transactions. NULL if no spending data is present.


data.transactions A data.frame or data.table with customers' purchase history. Every transaction record consists of a purchase date and a customer id. Optionally, the price of the transaction may be included to also allow for prediction of future customer spending.

time.unit The definition of a single period. Currently available are "hours", "days", "weeks", and "years". May be abbreviated.

date.format A single format to use when parsing any date that is given as character input. This includes the dates given in data.transaction, estimation.split, or as an input to any other function at a later point, such as prediction.end in predict. The function parse_date_time of package lubridate is used to parse inputs and hence all formats it accepts in argument orders can be used. For example, a date of format "year-month-day" (i.e., "2010-06-17") is indicated with "ymd". Other combinations such as "dmy", "dym", "ymd HMS", or "HMS dmy" are possible as well.

estimation.split May be specified as either the number of periods since the first transaction or the timepoint (either as character, Date, or POSIXct) at which the estimation period ends. The indicated timepoint itself will be part of the estimation sample. If no value is provided or set to NULL, the whole dataset will used for fitting the model (no holdout sample).

Aggregation of Transactions

Multiple transactions by the same customer that occur on the minimally representable temporal resolution are aggregated to a single transaction with their spending summed. For time units days and any other coarser Date-based time units (i.e. weeks, years), this means that transactions on the same day are combined. When using finer time units such as hours which are based on POSIXct, transactions on the same second are aggregated.

For the definition of repeat-purchases, combined transactions are viewed as a single transaction. Hence, repeat-transactions are determined from the aggregated transactions.


An object of class See the class definition for more details about the returned object.

The function summary can be used to obtain and print a summary of the data. The generic accessor function nobs is available to read out the number of customers.

See Also

SetStaticCovariates to add static covariates

SetDynamicCovariates for how to add dynamic covariates

plot to plot the repeat transactions

summary to summarize the transaction data

pnbd to fit Pareto/NBD models on a object



# create clv data object with weekly periods
#    and no splitting <- clvdata(data.transactions = cdnow,
                          time.unit = "weeks")

# same but split after 37 periods <- clvdata(data.transactions = cdnow,
                          time.unit = "w",
                          estimation.split = 37)

# same but estimation end on the 15th Oct 1997 <- clvdata(data.transactions = cdnow,
                          time.unit = "w",
                          estimation.split = "1997-10-15")

# summary of the transaction data

# plot the total number of transactions per period

# create data with the weekly periods defined to
#   start on Mondays

## Not run: 
# set start of week to Monday
oldopts <- options("lubridate.week.start"=1)

# create while Monday is the beginning of the week <- clvdata(data.transactions = cdnow,
                          time.unit = "weeks")

# Dynamic covariates now have to be supplied for every Monday

# set week start to what it was before

## End(Not run)

[Package CLVTools version 0.9.0 Index]