cdnet {CDatanet}R Documentation

Estimating count data models with social interactions under rational expectations using the NPL method


cdnet estimates count data models with social interactions under rational expectations using the NPL algorithm (see Houndetoungan, 2024).


  starting = list(lambda = NULL, Gamma = NULL, delta = NULL),
  Ey0 = NULL,
  ubslambda = 1L,
  optimizer = "fastlbfgs",
  npl.ctr = list(),
  opt.ctr = list(),
  cov = TRUE,



a class object formula: a symbolic description of the model. formula must be as, for example, y ~ x1 + x2 + gx1 + gx2 where y is the endogenous vector and x1, x2, gx1 and gx2 are control variables, which can include contextual variables, i.e. averages among the peers. Peer averages can be computed using the function peer.avg.


adjacency matrix. For networks consisting of multiple subnets, Glist can be a list of subnets with the m-th element being an n_s\times n_s-adjacency matrix, where n_s is the number of nodes in the m-th subnet. For heterogeneous peer effects (length(unique(group)) = h > 1), the m-th element must be a list of h^2 n_s\times n_s-adjacency matrices corresponding to the different network specifications (see Houndetoungan, 2024). For heterogeneous peer effects in the case of a single large network, Glist must be a one-item list. This item must be a list of h^2 network specifications. The order in which the networks in are specified are important and must match sort(unique(group)) (see examples).


the vector indicating the individual groups. The default assumes a common group. For 2 groups; that is, length(unique(group)) = 2, (e.g., A and B), four types of peer effects are defined: peer effects of A on A, of A on B, of B on A, and of B on B.


an integer indicating the theoretical upper bound of y. (see the model specification in details).


an L-vector, where L is the number of groups. For large Rmax the cost function is assumed to be semi-parametric (i.e., nonparametric from 0 to \bar{R} and quadratic beyond \bar{R}).


(optional) a starting value for \theta = (\lambda, \Gamma', \delta')', where \lambda, \Gamma, and \delta are the parameters to be estimated (see details).


(optional) a starting value for E(y).


a positive value indicating the upper bound of \sum_{s = 1}^S \lambda_s > 0.


is either fastlbfgs (L-BFGS optimization method of the package RcppNumerical), nlm (referring to the function nlm), or optim (referring to the function optim). Arguments for these functions such as, control and method can be set via the argument opt.ctr.


a list of controls for the NPL method (see details).


a list of arguments to be passed in optim_lbfgs of the package RcppNumerical, nlm or optim (the solver set in optimizer), such as maxit, eps_f, eps_g, control, method, etc.


a Boolean indicating if the covariance should be computed.


an optional data frame, list or environment (or object coercible by to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which cdnet is called.



The count variable y_i take the value r with probability.

P_{ir} = F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r}) - F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r + 1}).

In this equation, \mathbf{z}_i is a vector of control variables; F is the distribution function of the standard normal distribution; \bar{y}_i^{e,s} is the average of E(y) among peers using the s-th network definition; a_{h(i),r} is the r-th cut-point in the cost group h(i).

The following identification conditions have been introduced: \sum_{s = 1}^S \lambda_s > 0, a_{h(i),0} = -\infty, a_{h(i),1} = 0, and a_{h(i),r} = \infty for any r \geq R_{\text{max}} + 1. The last condition implies that P_{ir} = 0 for any r \geq R_{\text{max}} + 1. For any r \geq 1, the distance between two cut-points is a_{h(i),r+1} - a_{h(i),r} = \delta_{h(i),r} + \sum_{s = 1}^S \lambda_s As the number of cut-point can be large, a quadratic cost function is considered for r \geq \bar{R}_{h(i)}, where \bar{R} = (\bar{R}_{1}, ..., \bar{R}_{L}). With the semi-parametric cost-function, a_{h(i),r + 1} - a_{h(i),r}= \bar{\delta}_{h(i)} + \sum_{s = 1}^S \lambda_s.

The model parameters are: \lambda = (\lambda_1, ..., \lambda_S)', \Gamma, and \delta = (\delta_1', ..., \delta_L')', where \delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l}, \bar{\delta}_l)' for l = 1, ..., L. The number of single parameters in \delta_l depends on R_{\text{max}} and \bar{R}_{l}. The components \delta_{l,2}, ..., \delta_{l,\bar{R}_l} or/and \bar{\delta}_l must be removed in certain cases.
If R_{\text{max}} = \bar{R}_{l} \geq 2, then \delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l})'.
If R_{\text{max}} = \bar{R}_{l} = 1 (binary models), then \delta_l must be empty.
If R_{\text{max}} > \bar{R}_{l} = 1, then \delta_l = \bar{\delta}_l.


The model parameters are estimated using the Nested Partial Likelihood (NPL) method. This approach starts with a guess of \theta and E(y) and constructs iteratively a sequence of \theta and E(y). The solution converges when the \ell_1-distance between two consecutive \theta and E(y) is less than a tolerance.
The argument npl.ctr must include


the tolerance of the NPL algorithm (default 1e-4),


the maximal number of iterations allowed (default 500),


a boolean indicating if the estimate should be printed at each step.


the number of simulations performed use to compute integral in the covariance by important sampling.


A list consisting of:


a list of general information about the model.


the NPL estimator.


E(y), the expectation of y.


the average of E(y) friends.


a list including (if cov == TRUE) parms the covariance matrix and another list var.comp, which includes Sigma, as \Sigma, and Omega, as \Omega, matrices used for compute the covariance matrix.


step-by-step output as returned by the optimizer.


Houndetoungan, E. A. (2024). Count Data Models with Social Interactions under Rational Expectations. Available at SSRN 3721250, doi:10.2139/ssrn.3721250.

See Also

sart, sar, simcdnet.


M      <- 5 # Number of sub-groups
nvec   <- round(runif(M, 100, 200))
n      <- sum(nvec)

# Adjacency matrix
A      <- list()
for (m in 1:M) {
  nm           <- nvec[m]
  Am           <- matrix(0, nm, nm)
  max_d        <- 30 #maximum number of friends
  for (i in 1:nm) {
    tmp        <- sample((1:nm)[-i], sample(0:max_d, 1))
    Am[i, tmp] <- 1
  A[[m]]       <- Am
Anorm  <- #Row-normalization

# X
X      <- cbind(rnorm(n, 1, 3), rexp(n, 0.4))

# Two group:
group  <- 1*(X[,1] > 0.95)

# Networks
# length(group) = 2 and unique(sort(group)) = c(0, 1)
# The networks must be defined as to capture:
# peer effects of `0` on `0`, peer effects of `1` on `0`
# peer effects of `0` on `1`, and peer effects of `1` on `1`
G        <- list()
cums     <- c(0, cumsum(nvec))
for (m in 1:M) {
  tp     <- group[(cums[m] + 1):(cums[m + 1])]
  Am     <- A[[m]]
  G[[m]] <- * ((1 - tp) %*% t(1 - tp)),
                              Am * ((1 - tp) %*% t(tp)),
                              Am * (tp %*% t(1 - tp)),
                              Am * (tp %*% t(tp))))

# Parameters
lambda <- c(0.2, 0.3, -0.15, 0.25) 
Gamma  <- c(4.5, 2.2, -0.9, 1.5, -1.2)
delta  <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2) 

# Data
data   <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 =  X[,2])))
colnames(data) = c("x1", "x2", "gx1", "gx2")

ytmp   <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2),
                   lambda = lambda, Gamma = Gamma, delta = delta, group = group,
                   data = data)
y      <- ytmp$y
hist(y, breaks = max(y) + 1)

# Estimation
est    <- cdnet(formula = y ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2), group = group,
                optimizer = "fastlbfgs", data = data,
                opt.ctr = list(maxit = 5e3, eps_f = 1e-11, eps_g = 1e-11))

[Package CDatanet version 2.2.0 Index]