R: Pareto/NBD models

pnbd {CLVTools}

R Documentation

Pareto/NBD models

Description

Fits Pareto/NBD models on transactional data with and without covariates.

Usage

## S4 method for signature 'clv.data'
pnbd(
  clv.data,
  start.params.model = c(),
  use.cor = FALSE,
  start.param.cor = c(),
  optimx.args = list(),
  verbose = TRUE,
  ...
)

## S4 method for signature 'clv.data.static.covariates'
pnbd(
  clv.data,
  start.params.model = c(),
  use.cor = FALSE,
  start.param.cor = c(),
  optimx.args = list(),
  verbose = TRUE,
  names.cov.life = c(),
  names.cov.trans = c(),
  start.params.life = c(),
  start.params.trans = c(),
  names.cov.constr = c(),
  start.params.constr = c(),
  reg.lambdas = c(),
  ...
)

## S4 method for signature 'clv.data.dynamic.covariates'
pnbd(
  clv.data,
  start.params.model = c(),
  use.cor = FALSE,
  start.param.cor = c(),
  optimx.args = list(),
  verbose = TRUE,
  names.cov.life = c(),
  names.cov.trans = c(),
  start.params.life = c(),
  start.params.trans = c(),
  names.cov.constr = c(),
  start.params.constr = c(),
  reg.lambdas = c(),
  ...
)

Arguments

`clv.data`	The data object on which the model is fitted.
`start.params.model`	Named start parameters containing the optimization start parameters for the model without covariates.
`use.cor`	Whether the correlation between the transaction and lifetime process should be estimated.
`start.param.cor`	Start parameter for the optimization of the correlation.
`optimx.args`	Additional arguments to control the optimization which are forwarded to `optimx::optimx`. If multiple optimization methods are specified, only the result of the last method is further processed.
`verbose`	Show details about the running of the function.
`...`	Ignored
`names.cov.life`	Which of the set Lifetime covariates should be used. Missing parameter indicates all covariates shall be used.
`names.cov.trans`	Which of the set Transaction covariates should be used. Missing parameter indicates all covariates shall be used.
`start.params.life`	Named start parameters containing the optimization start parameters for all lifetime covariates.
`start.params.trans`	Named start parameters containing the optimization start parameters for all transaction covariates.
`names.cov.constr`	Which covariates should be forced to use the same parameters for the lifetime and transaction process. The covariates need to be present as both, lifetime and transaction covariates.
`start.params.constr`	Named start parameters containing the optimization start parameters for the constraint covariates.
`reg.lambdas`	Named lambda parameters used for the L2 regularization of the lifetime and the transaction covariate parameters. Lambdas have to be >= 0.

Details

Model parameters for the Pareto/NBD model are alpha, r, beta, and s.
s: shape parameter of the Gamma distribution for the lifetime process. The smaller s, the stronger the heterogeneity of customer lifetimes.
beta: rate parameter for the Gamma distribution for the lifetime process.
r: shape parameter of the Gamma distribution of the purchase process. The smaller r, the stronger the heterogeneity of the purchase process.
alpha: rate parameter of the Gamma distribution of the purchase process.

Based on these parameters, the average purchase rate while customers are active is r/alpha and the average dropout rate is s/beta.

Ideally, the starting parameters for r and s represent your best guess concerning the heterogeneity of customers in their buy and die rate. If covariates are included into the model additionally parameters for the covariates affecting the attrition and the purchase process are part of the model.

If no start parameters are given, 1.0 is used for all model parameters and 0.1 for covariate parameters. The model start parameters are required to be > 0.

The Pareto/NBD model

The Pareto/NBD is the first model addressing the issue of modeling customer purchases and attrition simultaneously for non-contractual settings. The model uses a Pareto distribution, a combination of an Exponential and a Gamma distribution, to explicitly model customers' (unobserved) attrition behavior in addition to customers' purchase process.
In general, the Pareto/NBD model consist of two parts. A first process models the purchase behavior of customers as long as the customers are active. A second process models customers' attrition. Customers live (and buy) for a certain unknown time until they become inactive and "die". Customer attrition is unobserved. Inactive customers may not be reactivated. For technical details we refer to the original paper by Schmittlein, Morrison and Colombo (1987) and the detailed technical note of Fader and Hardie (2005).

Pareto/NBD model with static covariates

The standard Pareto/NBD model captures heterogeneity was solely using Gamma distributions. However, often exogenous knowledge, such as for example customer demographics, is available. The supplementary knowledge may explain part of the heterogeneity among the customers and therefore increase the predictive accuracy of the model. In addition, we can rely on these parameter estimates for inference, i.e. identify and quantify effects of contextual factors on the two underlying purchase and attrition processes. For technical details we refer to the technical note by Fader and Hardie (2007).

Pareto/NBD model with dynamic covariates

In many real-world applications customer purchase and attrition behavior may be influenced by covariates that vary over time. In consequence, the timing of a purchase and the corresponding value of at covariate a that time becomes relevant. Time-varying covariates can affect customer on aggregated level as well as on an individual level: In the first case, all customers are affected simultaneously, in the latter case a covariate is only relevant for a particular customer. For technical details we refer to the paper by Bachmann, Meierer and Näf (2020).

Value

Depending on the data object on which the model was fit, pnbd returns either an object of class clv.pnbd, clv.pnbd.static.cov, or clv.pnbd.dynamic.cov.

The function summary can be used to obtain and print a summary of the results. The generic accessor functions coefficients, vcov, fitted, logLik, AIC, BIC, and nobs are available.

Note

The Pareto/NBD model with dynamic covariates can currently not be fit with data that has a temporal resolution of less than one day (data that was built with time unit hours).

References

Schmittlein DC, Morrison DG, Colombo R (1987). “Counting Your Customers: Who-Are They and What Will They Do Next?” Management Science, 33(1), 1-24.

Bachmann P, Meierer M, Naef, J (2021). “The Role of Time-Varying Contextual Factors in Latent Attrition Models for Customer Base Analysis” Marketing Science 40(4). 783-809.

Fader PS, Hardie BGS (2005). “A Note on Deriving the Pareto/NBD Model and Related Expressions.” URL http://www.brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf.

Fader PS, Hardie BGS (2007). “Incorporating time-invariant covariates into the Pareto/NBD and BG/NBD models.” URL http://www.brucehardie.com/notes/019/time_invariant_covariates.pdf.

Fader PS, Hardie BGS (2020). “Deriving an Expression for P(X(t)=x) Under the Pareto/NBD Model.” URL https://www.brucehardie.com/notes/012/pareto_NBD_pmf_derivation_rev.pdf

Examples


data("apparelTrans")
clv.data.apparel <- clvdata(apparelTrans, date.format = "ymd",
                            time.unit = "w", estimation.split = 40)

# Fit standard pnbd model
pnbd(clv.data.apparel)

# Give initial guesses for the model parameters
pnbd(clv.data.apparel,
     start.params.model = c(r=0.5, alpha=15, s=0.5, beta=10))


# pass additional parameters to the optimizer (optimx)
#    Use Nelder-Mead as optimization method and print
#    detailed information about the optimization process
apparel.pnbd <- pnbd(clv.data.apparel,
                     optimx.args = list(method="Nelder-Mead",
                                        control=list(trace=6)))

# estimated coefs
coef(apparel.pnbd)

# summary of the fitted model
summary(apparel.pnbd)

# predict CLV etc for holdout period
predict(apparel.pnbd)

# predict CLV etc for the next 15 periods
predict(apparel.pnbd, prediction.end = 15)


# Estimate correlation as well
pnbd(clv.data.apparel, use.cor = TRUE)


# To estimate the pnbd model with static covariates,
#   add static covariates to the data
data("apparelStaticCov")
clv.data.static.cov <-
 SetStaticCovariates(clv.data.apparel,
                     data.cov.life = apparelStaticCov,
                     names.cov.life = c("Gender", "Channel"),
                     data.cov.trans = apparelStaticCov,
                     names.cov.trans = c("Gender", "Channel"))

# Fit pnbd with static covariates
pnbd(clv.data.static.cov)

# Give initial guesses for both covariate parameters
pnbd(clv.data.static.cov, start.params.trans = c(Gender=0.75, Channel=0.7),
                   start.params.life  = c(Gender=0.5, Channel=0.5))

# Use regularization
pnbd(clv.data.static.cov, reg.lambdas = c(trans = 5, life=5))

# Force the same coefficient to be used for both covariates
pnbd(clv.data.static.cov, names.cov.constr = "Gender",
                   start.params.constr = c(Gender=0.5))

# Fit model only with the Channel covariate for life but
# keep all trans covariates as is
pnbd(clv.data.static.cov, names.cov.life = c("Channel"))

# Add dynamic covariates data to the data object
#   add dynamic covariates to the data

## Not run: 
data("apparelDynCov")
clv.data.dyn.cov <-
  SetDynamicCovariates(clv.data = clv.data.apparel,
                       data.cov.life = apparelDynCov,
                       data.cov.trans = apparelDynCov,
                       names.cov.life = c("Marketing", "Gender", "Channel"),
                       names.cov.trans = c("Marketing", "Gender", "Channel"),
                       name.date = "Cov.Date")


# Fit PNBD with dynamic covariates
pnbd(clv.data.dyn.cov)

# The same fitting options as for the
#  static covariate are available
pnbd(clv.data.dyn.cov, reg.lambdas = c(trans=10, life=2))

## End(Not run)

[Package CLVTools version 0.10.0 Index]