overdispersedStan {networkscaleup}R Documentation

Fit ARD using the Overdispersed model in Stan

Description

This function fits the ARD using the Overdispersed model in Stan. The population size estimates and degrees are scaled using a post-hoc procedure. For the Gibbs-Metropolis algorithm implementation, see overdispersed.

Usage

overdispersedStan(
  ard,
  known_sizes = NULL,
  known_ind = NULL,
  G1_ind = NULL,
  G2_ind = NULL,
  B2_ind = NULL,
  N = NULL,
  chains = 3,
  cores = 1,
  warmup = 1000,
  iter = 1500,
  thin = 1,
  return_fit = FALSE,
  ...
)

Arguments

ard

The 'n_i x n_k' matrix of non-negative ARD integer responses, where the '(i,k)th' element corresponds to the number of people that respondent 'i' knows in subpopulation 'k'.

known_sizes

The known subpopulation sizes corresponding to a subset of the columns of ard.

known_ind

The indices that correspond to the columns of ard with known_sizes. By default, the function assumes the first n_known columns, where n_known corresponds to the number of known_sizes.

G1_ind

A vector of indices denoting the columns of 'ard' that correspond to the primary scaling groups, i.e. the collection of rare girls' names in Zheng, Salganik, and Gelman (2006). By default, all known_sizes are used. If G2_ind and B2_ind are not provided, 'C = C_1', so only G1_ind are used. If G1_ind is not provided, no scaling is performed.

G2_ind

A vector of indices denoting the columns of 'ard' that correspond to the subpopulations that belong to the first secondary scaling groups, i.e. the collection of somewhat popular girls' names.

B2_ind

A vector of indices denoting the columns of 'ard' that correspond to the subpopulations that belong to the second secondary scaling groups, i.e. the collection of somewhat popular boys' names.

N

The known total population size.

chains

A positive integer specifying the number of Markov chains.

cores

A positive integer specifying the number of cores to use to run the Markov chains in parallel.

warmup

A positive integer specifying the total number of samples for each chain (including warmup). Matches the usage in stan.

iter

A positive integer specifying the number of warmup samples for each chain. Matches the usage in stan.

thin

A positive integer specifying the interval for saving posterior samples. Default value is 1 (i.e. no thinning).

return_fit

A logical indicating whether the fitted Stan model should be returned instead of the rstan::extracted and scaled parameters. This is FALSE by default.

...

Additional arguments to be passed to stan.

Details

This function fits the overdispersed NSUM model using the Gibbs-Metropolis algorithm provided in Zheng et al. (2006).

Value

Either the full fitted Stan model if return_fit = TRUE, else a named list with the estimated parameters extracted using extract (the default). The estimated parameters are named as follows, with additional descriptions as needed:

alphas

Log degree, if 'scaling = TRUE', else raw alpha parameters

betas

Log prevalence, if 'scaling = TRUE', else raw beta parameters

inv_omegas

Inverse of overdispersion parameters

sigma_alpha

Standard deviation of alphas

mu_beta

Mean of betas

sigma_beta

Standard deviation of betas

omegas

Overdispersion parameters

If 'scaling = TRUE', the following additional parameters are included:

mu_alpha

Mean of log degrees

degrees

Degree estimates

sizes

Subpopulation size estimates

References

Zheng, T., Salganik, M. J., and Gelman, A. (2006). How many people do you know in prison, Journal of the American Statistical Association, 101:474, 409–423

Examples

# Analyze an example ard data set using Zheng et al. (2006) models
# Note that in practice, both warmup and iter should be much higher
## Not run: 
data(example_data)

ard = example_data$ard
subpop_sizes = example_data$subpop_sizes
known_ind = c(1, 2, 4)
N = example_data$N

overdisp.est = overdispersedStan(ard,
known_sizes = subpop_sizes[known_ind],
known_ind = known_ind,
G1_ind = 1,
G2_ind = 2,
B2_ind = 4,
N = N,
chains = 1,
cores = 1,
warmup = 250,
iter = 500)

# Compare size estimates
round(data.frame(true = subpop_sizes,
basic = colMeans(overdisp.est$sizes)))

# Compare degree estimates
plot(example_data$degrees, colMeans(overdisp.est$degrees))

# Look at overdispersion parameter
colMeans(overdisp.est$omegas)

## End(Not run)

[Package networkscaleup version 0.1-2 Index]