simcdnet {CDatanet}R Documentation

Simulating count data models with social interactions under rational expectations

Description

simcdnet simulate the count data model with social interactions under rational expectations developed by Houndetoungan (2024).

Usage

simcdnet(
  formula,
  group,
  Glist,
  parms,
  lambda,
  Gamma,
  delta,
  Rmax,
  Rbar,
  tol = 1e-10,
  maxit = 500,
  data
)

Arguments

formula

a class object formula: a symbolic description of the model. formula must be as, for example, y ~ x1 + x2 + gx1 + gx2 where y is the endogenous vector and x1, x2, gx1 and gx2 are control variables, which can include contextual variables, i.e. averages among the peers. Peer averages can be computed using the function peer.avg.

group

the vector indicating the individual groups. The default assumes a common group. For 2 groups; that is, length(unique(group)) = 2, (e.g., A and B), four types of peer effects are defined: peer effects of A on A, of A on B, of B on A, and of B on B.

Glist

adjacency matrix. For networks consisting of multiple subnets, Glist can be a list of subnets with the m-th element being an ns×nsn_s\times n_s-adjacency matrix, where nsn_s is the number of nodes in the m-th subnet. For heterogeneous peer effects (length(unique(group)) = h > 1), the m-th element must be a list of h2h^2 ns×nsn_s\times n_s-adjacency matrices corresponding to the different network specifications (see Houndetoungan, 2024). For heterogeneous peer effects in the case of a single large network, Glist must be a one-item list. This item must be a list of h2h^2 network specifications. The order in which the networks in are specified are important and must match sort(unique(group)) (see examples).

parms

a vector defining the true value of θ=(λ,Γ,δ)\theta = (\lambda', \Gamma', \delta')' (see the model specification in details). Each parameter λ\lambda, Γ\Gamma, or δ\delta can also be given separately to the arguments lambda, Gamma, or delta.

lambda

the true value of the vector λ\lambda.

Gamma

the true value of the vector Γ\Gamma.

delta

the true value of the vector δ\delta.

Rmax

an integer indicating the theoretical upper bound of y. (see the model specification in details).

Rbar

an LL-vector, where LL is the number of groups. For large Rmax the cost function is assumed to be semi-parametric (i.e., nonparametric from 0 to Rˉ\bar{R} and quadratic beyond Rˉ\bar{R}). The l-th element of Rbar indicates Rˉ\bar{R} for the l-th value of sort(unique(group)) (see the model specification in details).

tol

the tolerance value used in the Fixed Point Iteration Method to compute the expectancy of y. The process stops if the 1\ell_1-distance between two consecutive E(y)E(y) is less than tol.

maxit

the maximal number of iterations in the Fixed Point Iteration Method.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which simcdnet is called.

Details

The count variable yiy_i take the value rr with probability.

Pir=F(s=1Sλsyˉie,s+ziΓah(i),r)F(s=1Sλsyˉie,s+ziΓah(i),r+1).P_{ir} = F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r}) - F(\sum_{s = 1}^S \lambda_s \bar{y}_i^{e,s} + \mathbf{z}_i'\Gamma - a_{h(i),r + 1}).

In this equation, zi\mathbf{z}_i is a vector of control variables; FF is the distribution function of the standard normal distribution; yˉie,s\bar{y}_i^{e,s} is the average of E(y)E(y) among peers using the s-th network definition; ah(i),ra_{h(i),r} is the r-th cut-point in the cost group h(i)h(i).

The following identification conditions have been introduced: s=1Sλs>0\sum_{s = 1}^S \lambda_s > 0, ah(i),0=a_{h(i),0} = -\infty, ah(i),1=0a_{h(i),1} = 0, and ah(i),r=a_{h(i),r} = \infty for any rRmax+1r \geq R_{\text{max}} + 1. The last condition implies that Pir=0P_{ir} = 0 for any rRmax+1r \geq R_{\text{max}} + 1. For any r1r \geq 1, the distance between two cut-points is ah(i),r+1ah(i),r=δh(i),r+s=1Sλsa_{h(i),r+1} - a_{h(i),r} = \delta_{h(i),r} + \sum_{s = 1}^S \lambda_s As the number of cut-point can be large, a quadratic cost function is considered for rRˉh(i)r \geq \bar{R}_{h(i)}, where Rˉ=(Rˉ1,...,RˉL)\bar{R} = (\bar{R}_{1}, ..., \bar{R}_{L}). With the semi-parametric cost-function, ah(i),r+1ah(i),r=δˉh(i)+s=1Sλsa_{h(i),r + 1} - a_{h(i),r}= \bar{\delta}_{h(i)} + \sum_{s = 1}^S \lambda_s.

The model parameters are: λ=(λ1,...,λS)\lambda = (\lambda_1, ..., \lambda_S)', Γ\Gamma, and δ=(δ1,...,δL)\delta = (\delta_1', ..., \delta_L')', where δl=(δl,2,...,δl,Rˉl,δˉl)\delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l}, \bar{\delta}_l)' for l=1,...,Ll = 1, ..., L. The number of single parameters in δl\delta_l depends on RmaxR_{\text{max}} and Rˉl\bar{R}_{l}. The components δl,2,...,δl,Rˉl\delta_{l,2}, ..., \delta_{l,\bar{R}_l} or/and δˉl\bar{\delta}_l must be removed in certain cases.
If Rmax=Rˉl2R_{\text{max}} = \bar{R}_{l} \geq 2, then δl=(δl,2,...,δl,Rˉl)\delta_l = (\delta_{l,2}, ..., \delta_{l,\bar{R}_l})'.
If Rmax=Rˉl=1R_{\text{max}} = \bar{R}_{l} = 1 (binary models), then δl\delta_l must be empty.
If Rmax>Rˉl=1R_{\text{max}} > \bar{R}_{l} = 1, then δl=δˉl\delta_l = \bar{\delta}_l.

Value

A list consisting of:

yst

yy^{\ast}, the latent variable.

y

the observed count variable.

Ey

E(y)E(y), the expectation of y.

GEy

the average of E(y)E(y) friends.

meff

a list includinh average and individual marginal effects.

Rmax

infinite sums in the marginal effects are approximated by sums up to Rmax.

iteration

number of iterations performed by sub-network in the Fixed Point Iteration Method.

References

Houndetoungan, E. A. (2024). Count Data Models with Social Interactions under Rational Expectations. Available at SSRN 3721250, doi:10.2139/ssrn.3721250.

See Also

cdnet, simsart, simsar.

Examples


set.seed(123)
M      <- 5 # Number of sub-groups
nvec   <- round(runif(M, 100, 200))
n      <- sum(nvec)

# Adjacency matrix
A      <- list()
for (m in 1:M) {
  nm           <- nvec[m]
  Am           <- matrix(0, nm, nm)
  max_d        <- 30 #maximum number of friends
  for (i in 1:nm) {
    tmp        <- sample((1:nm)[-i], sample(0:max_d, 1))
    Am[i, tmp] <- 1
  }
  A[[m]]       <- Am
}
Anorm  <- norm.network(A) #Row-normalization

# X
X      <- cbind(rnorm(n, 1, 3), rexp(n, 0.4))

# Two group:
group  <- 1*(X[,1] > 0.95)

# Networks
# length(group) = 2 and unique(sort(group)) = c(0, 1)
# The networks must be defined as to capture:
# peer effects of `0` on `0`, peer effects of `1` on `0`
# peer effects of `0` on `1`, and peer effects of `1` on `1`
G        <- list()
cums     <- c(0, cumsum(nvec))
for (m in 1:M) {
  tp     <- group[(cums[m] + 1):(cums[m + 1])]
  Am     <- A[[m]]
  G[[m]] <- norm.network(list(Am * ((1 - tp) %*% t(1 - tp)),
                              Am * ((1 - tp) %*% t(tp)),
                              Am * (tp %*% t(1 - tp)),
                              Am * (tp %*% t(tp))))
}

# Parameters
lambda <- c(0.2, 0.3, -0.15, 0.25) 
Gamma  <- c(4.5, 2.2, -0.9, 1.5, -1.2)
delta  <- rep(c(2.6, 1.47, 0.85, 0.7, 0.5), 2) 

# Data
data   <- data.frame(X, peer.avg(Anorm, cbind(x1 = X[,1], x2 =  X[,2])))
colnames(data) = c("x1", "x2", "gx1", "gx2")

ytmp   <- simcdnet(formula = ~ x1 + x2 + gx1 + gx2, Glist = G, Rbar = rep(5, 2),
                   lambda = lambda, Gamma = Gamma, delta = delta, group = group,
                   data = data)
y      <- ytmp$y
hist(y, breaks = max(y) + 1)
table(y)

[Package CDatanet version 2.2.0 Index]