R: MCMC Algorithm for Hierarchical Multinomial Logit with...

rhierMnlDP {bayesm}

R Documentation

MCMC Algorithm for Hierarchical Multinomial Logit with Dirichlet Process Prior Heterogeneity

Description

rhierMnlDP is a MCMC algorithm for a hierarchical multinomial logit with a Dirichlet Process prior for the distribution of heteorogeneity. A base normal model is used so that the DP can be interpreted as allowing for a mixture of normals with as many components as there are panel units. This is a hybrid Gibbs Sampler with a RW Metropolis step for the MNL coefficients for each panel unit. This procedure can be interpreted as a Bayesian semi-parameteric method in the sense that the DP prior can accomodate heterogeniety of an unknown form.

Usage

rhierMnlDP(Data, Prior, Mcmc)

Arguments

`Data`	list(lgtdata, Z, p)
`Prior`	list(deltabar, Ad, Prioralpha, lambda_hyper)
`Mcmc`	list(R, keep, nprint, s, w, maxunique, gridsize)

Details

Model and Priors

y_i \sim MNL(X_i, \beta_i) with i = 1, \ldots, length(lgtdata) and where \theta_i is nvar x 1

\beta_i = Z\Delta[i,] + u_i
Note: Z\Delta is the matrix Z * \Delta; [i,] refers to ith row of this product
Delta is an nz x nvar matrix

\beta_i \sim N(\mu_i, \Sigma_i)

\theta_i = (\mu_i, \Sigma_i) \sim DP(G_0(\lambda), alpha)

G_0(\lambda):
\mu_i | \Sigma_i \sim N(0, \Sigma_i (x) a^{-1})
\Sigma_i \sim IW(nu, nu*v*I)
delta = vec(\Delta) \sim N(deltabar, A_d^{-1})

\lambda(a, nu, v):
a \sim uniform[alim[1], alimb[2]]
nu \sim dim(data)-1 + exp(z)
z \sim uniform[dim(data)-1+nulim[1], nulim[2]]
v \sim uniform[vlim[1], vlim[2]]

alpha \sim (1-(alpha-alphamin) / (alphamax-alphamin))^{power}
alpha = alphamin then expected number of components = Istarmin
alpha = alphamax then expected number of components = Istarmax

Z should NOT include an intercept and is centered for ease of interpretation. The mean of each of the nlgt \betas is the mean of the normal mixture. Use summary() to compute this mean from the compdraw output.

We parameterize the prior on \Sigma_i such that mode(\Sigma) = nu/(nu+2) vI. The support of nu enforces a non-degenerate IW density; nulim[1] > 0.

The default choices of alim, nulim, and vlim determine the location and approximate size of candidate "atoms" or possible normal components. The defaults are sensible given a reasonable scaling of the X variables. You want to insure that alim is set for a wide enough range of values (remember a is a precision parameter) and the v is big enough to propose Sigma matrices wide enough to cover the data range.

A careful analyst should look at the posterior distribution of a, nu, v to make sure that the support is set correctly in alim, nulim, vlim. In other words, if we see the posterior bunched up at one end of these support ranges, we should widen the range and rerun.

If you want to force the procedure to use many small atoms, then set nulim to consider only large values and set vlim to consider only small scaling constants. Set alphamax to a large number. This will create a very "lumpy" density estimate somewhat like the classical Kernel density estimates. Of course, this is not advised if you have a prior belief that densities are relatively smooth.

Argument Details

Data = list(lgtdata, Z, p) [Z optional]

`lgtdata:`	list of lists with each cross-section unit MNL data
`lgtdata[[i]]$y:`	`n_i x 1` vector of multinomial outcomes (1, ..., m)
`lgtdata[[i]]$X:`	`n_i x nvar` design matrix for `i`th unit
`Z:`	`nreg x nz` matrix of unit characteristics (def: vector of ones)
`p:`	number of choice alternatives

Prior = list(deltabar, Ad, Prioralpha, lambda_hyper) [optional]

`deltabar:`	`nz*nvar x 1` vector of prior means (def: 0)
`Ad:`	prior precision matrix for vec(D) (def: 0.01*I)
`Prioralpha:`	`list(Istarmin, Istarmax, power)`
`$Istarmin:`	expected number of components at lower bound of support of alpha def(1)
`$Istarmax:`	expected number of components at upper bound of support of alpha (def: min(50, 0.1*nlgt))
`$power:`	power parameter for alpha prior (def: 0.8)
`lambda_hyper:`	`list(alim, nulim, vlim)`
`$alim:`	defines support of a distribution (def: `c(0.01, 2)`)
`$nulim:`	defines support of nu distribution (def: `c(0.001, 3)`)
`$vlim:`	defines support of v distribution (def: `c(0.1, 4)`)

Mcmc = list(R, keep, nprint, s, w, maxunique, gridsize) [only R required]

`R:`	number of MCMC draws
`keep:`	MCMC thinning parameter -- keep every `keep`th draw (def: 1)
`nprint:`	print the estimated time remaining for every `nprint`'th draw (def: 100, set to 0 for no print)
`s:`	scaling parameter for RW Metropolis (def: 2.93/`sqrt(nvar)`)
`w:`	fractional likelihood weighting parameter (def: 0.1)
`maxuniq:`	storage constraint on the number of unique components (def: 200)
`gridsize:`	number of discrete points for hyperparameter priors, (def: 20)

`nmix` Details

nmix is a list with 3 components. Several functions in the bayesm package that involve a Dirichlet Process or mixture-of-normals return nmix. Across these functions, a common structure is used for nmix in order to utilize generic summary and plotting functions.

`probdraw:`	`ncomp x R/keep` matrix that reports the probability that each draw came from a particular component (here, a one-column matrix of 1s)
`zdraw:`	`R/keep x nobs` matrix that indicates which component each draw is assigned to (here, null)
`compdraw:`	A list of `R/keep` lists of `ncomp` lists. Each of the inner-most lists has 2 elemens: a vector of draws for `mu` and a matrix of draws for the Cholesky root of `Sigma`.

Value

A list containing:

`Deltadraw`	`R/keep x nz*nvar` matrix of draws of Delta, first row is initial value
`betadraw`	`nlgt x nvar x R/keep` array of draws of betas
`nmix`	a list containing: `probdraw`, `zdraw`, `compdraw` (see “`nmix` Details” section)
`adraw`	`R/keep` draws of hyperparm a
`vdraw`	`R/keep` draws of hyperparm v
`nudraw`	`R/keep` draws of hyperparm nu
`Istardraw`	`R/keep` draws of number of unique components
`alphadraw`	`R/keep` draws of number of DP tightness parameter
`loglike`	`R/keep` draws of log-likelihood

Note

As is well known, Bayesian density estimation involves computing the predictive distribution of a "new" unit parameter, \theta_{n+1} (here "n"=nlgt). This is done by averaging the normal base distribution over draws from the distribution of \theta_{n+1} given \theta_1, ..., \theta_n, alpha, lambda, data. To facilitate this, we store those draws from the predictive distribution of \theta_{n+1} in a list structure compatible with other bayesm routines that implement a finite mixture of normals.

Large R values may be required (>20,000).

Author(s)

Peter Rossi, Anderson School, UCLA, perossichi@gmail.com.

References

For further discussion, see Chapter 5, Bayesian Statistics and Marketing by Rossi, Allenby, and McCulloch.

Examples

if(nchar(Sys.getenv("LONG_TEST")) != 0) {R=20000} else {R=10}
set.seed(66)

p = 3                                # num of choice alterns
ncoef = 3  
nlgt = 300                           # num of cross sectional units
nz = 2
Z = matrix(runif(nz*nlgt),ncol=nz)
Z = t(t(Z)-apply(Z,2,mean))          # demean Z
ncomp = 3                            # no of mixture components
Delta=matrix(c(1,0,1,0,1,2),ncol=2)

comps = NULL
comps[[1]] = list(mu=c(0,-1,-2),   rooti=diag(rep(2,3)))
comps[[2]] = list(mu=c(0,-1,-2)*2, rooti=diag(rep(2,3)))
comps[[3]] = list(mu=c(0,-1,-2)*4, rooti=diag(rep(2,3)))
pvec=c(0.4, 0.2, 0.4)

##  simulate from MNL model conditional on X matrix
simmnlwX = function(n,X,beta) {
  k = length(beta)
  Xbeta = X%*%beta
  j = nrow(Xbeta) / n
  Xbeta = matrix(Xbeta, byrow=TRUE, ncol=j)
  Prob = exp(Xbeta)
  iota = c(rep(1,j))
  denom = Prob%*%iota
  Prob = Prob / as.vector(denom)
  y = vector("double", n)
  ind = 1:j
  for (i in 1:n) {
  yvec = rmultinom(1, 1, Prob[i,])
  y[i] = ind%*%yvec}
  return(list(y=y, X=X, beta=beta, prob=Prob))
}

## simulate data with a mixture of 3 normals
simlgtdata = NULL
ni = rep(50,300)
for (i in 1:nlgt) {
  betai = Delta%*%Z[i,] + as.vector(rmixture(1,pvec,comps)$x)
   Xa = matrix(runif(ni[i]*p,min=-1.5,max=0), ncol=p)
   X = createX(p, na=1, nd=NULL, Xa=Xa, Xd=NULL, base=1)
   outa = simmnlwX(ni[i], X, betai)
   simlgtdata[[i]] = list(y=outa$y, X=X, beta=betai)
}

## plot betas
if(0){
  bmat = matrix(0, nlgt, ncoef)
  for(i in 1:nlgt) { bmat[i,] = simlgtdata[[i]]$beta }
  par(mfrow = c(ncoef,1))
  for(i in 1:ncoef) { hist(bmat[,i], breaks=30, col="magenta") }
}

## set Data and Mcmc lists
keep = 5
Mcmc1 = list(R=R, keep=keep)
Data1 = list(p=p, lgtdata=simlgtdata, Z=Z)

out = rhierMnlDP(Data=Data1, Mcmc=Mcmc1)

cat("Summary of Delta draws", fill=TRUE)
summary(out$Deltadraw, tvalues=as.vector(Delta))

## plotting examples
if(0) {
  plot(out$betadraw)
  plot(out$nmix)
}

[Package bayesm version 3.1-6 Index]