correlatedStan {networkscaleup} | R Documentation |
Fit ARD using the uncorrelated or correlated model in Stan This function fits the ARD using either the uncorrelated or correlated model in Laga et al. (2021) in Stan. The population size estimates and degrees are scaled using a post-hoc procedure.
Description
Fit ARD using the uncorrelated or correlated model in Stan This function fits the ARD using either the uncorrelated or correlated model in Laga et al. (2021) in Stan. The population size estimates and degrees are scaled using a post-hoc procedure.
Usage
correlatedStan(
ard,
known_sizes = NULL,
known_ind = NULL,
N = NULL,
model = c("correlated", "uncorrelated"),
scaling = c("all", "overdispersed", "weighted", "weighted_sq"),
x = NULL,
z_global = NULL,
z_subpop = NULL,
G1_ind = NULL,
G2_ind = NULL,
B2_ind = NULL,
chains = 3,
cores = 1,
warmup = 1000,
iter = 1500,
thin = 1,
return_fit = FALSE,
...
)
Arguments
ard |
The 'n_i x n_k' matrix of non-negative ARD integer responses, where the '(i,k)th' element corresponds to the number of people that respondent 'i' knows in subpopulation 'k'. |
known_sizes |
The known subpopulation sizes corresponding to a subset of
the columns of |
known_ind |
The indices that correspond to the columns of |
N |
The known total population size. |
model |
A character vector denoting which of the two models should be fit, either 'uncorrelated' or 'correlated'. More details of these models are provided below. The function decides which covariate model is needed based on the covariates provided below. |
scaling |
An optional character vector providing the name of scaling procedure should be performed in order to transform estimates to degrees and subpopulation sizes. If 'NULL', the parameters will be returned unscaled. Alternatively, scaling may be performed independently using the scaling function. Scaling options are 'NULL', 'overdispersed', 'all', 'weighted', or 'weighted_sq' ('weighted' and 'weighted_sq' are only available if 'model = "correlated"'. Further details are provided in the Details section. |
x |
A matrix with dimensions 'n_i x n_unknown', where 'n_unknown' refers to the number of unknown subpopulation sizes. In the language of Teo et al. (2019), these represent the individual's perception of each hidden population. |
z_global |
A matrix with dimensions 'n_i x p_global', where 'p_global' is the number of demographic covariates used. This matrix represents the demographic information about the respondents in order to capture the barrier effects. |
z_subpop |
A matrix with dimensions 'n_i x p_subpop', where 'p_subpop' is the number of demographic covariates used. This matrix represents the demographic information about the respondents in order to capture the barrier effects. |
G1_ind |
A vector of indices denoting the columns of 'ard' that correspond to the primary scaling groups, i.e. the collection of rare girls' names in Zheng, Salganik, and Gelman (2006). By default, all known_sizes are used. If G2_ind and B2_ind are not provided, 'C = C_1', so only G1_ind are used. If G1_ind is not provided, no scaling is performed. |
G2_ind |
A vector of indices denoting the columns of 'ard' that correspond to the subpopulations that belong to the first secondary scaling groups, i.e. the collection of somewhat popular girls' names. |
B2_ind |
A vector of indices denoting the columns of 'ard' that correspond to the subpopulations that belong to the second secondary scaling groups, i.e. the collection of somewhat popular boys' names. |
chains |
A positive integer specifying the number of Markov chains. |
cores |
A positive integer specifying the number of cores to use to run the Markov chains in parallel. |
warmup |
A positive integer specifying the total number of samples for each chain (including warmup). Matches the usage in stan. |
iter |
A positive integer specifying the number of warmup samples for each chain. Matches the usage in stan. |
thin |
A positive integer specifying the interval for saving posterior samples. Default value is 1 (i.e. no thinning). |
return_fit |
A logical indicating whether the fitted 'stanfit' object should be return. Defaults to 'FALSE'. |
... |
Additional arguments to be passed to stan. |
Details
This function currently fits a variety of models proposed in Laga et al. (2022+). The user may provide any combination of 'x', 'z_global', and 'z_subpop'. Additionally, the user may choose to fit a uncorrelated version of the model, where the correlation matrix is equal to the identity matrix.
The 'scaling' options are described below:
- NULL
No scaling is performed
- overdispersed
The scaling procedure outlined in Zheng et al. (2006) is performed. In this case, at least 'Pg1_ind' must be provided. See overdispersedStan for more details.
- all
All subpopulations with known sizes are used to scale the parameters, using a modified scaling procedure that standardizes the sizes so each population is weighted equally. Additional details are provided in Laga et al. (2022+).
- weighted
All subpopulations with known sizes are weighted according their correlation with the unknown subpopulation size. Additional details are provided in Laga et al. (2022+)
- weighted_sq
Same as 'weighted', except the weights are squared, providing more relative weight to subpopulations with higher correlation.
Value
Either the full fitted Stan model if return_fit = TRUE
, else a
named list with the estimated parameters extracted using
extract (the default). The estimated parameters are named as
follows (if estimated in the corresponding model), with additional
descriptions as needed:
- delta
Raw delta parameters
- sigma_delta
Standard deviation of delta
- rho
Log prevalence, if scaled, else raw rho parameters
- mu_rho
Mean of rho
- sigma_rho
Standard deviation of rho
- alpha
Slope parameters corresponding to z
- beta_global
Slope parameters corresponding to x_global
- beta_subpop
Slope parameters corresponding to x_subpop
- tau_N
Standard deviation of random effects b
- Corr
Correlation matrix, if 'Correlation = TRUE'
If scaled, the following additional parameters are included:
- log_degrees
Scaled log degrees
- degree
Scaled degrees
- log_prevalences
Scaled log prevalences
- sizes
Subpopulation size estimates
References
Laga, I., Bao, L., and Niu, X (2021). A Correlated Network Scaleup Model: Finding the Connection Between Subpopulations
Examples
## Not run:
data(example_data)
x = example_data$x
z_global = example_data$z[,1:2]
z_subpop = example_data$z[,3:4]
basic_corr_est = correlatedStan(example_data$ard,
known_sizes = example_data$subpop_sizes[c(1, 2, 4)],
known_ind = c(1, 2, 4),
N = example_data$N,
model = "correlated",
scaling = "weighted",
chains = 1,
cores = 1,
warmup = 50,
iter = 100)
cov_uncorr_est = correlatedStan(example_data$ard,
known_sizes = example_data$subpop_sizes[c(1, 2, 4)],
known_ind = c(1, 2, 4),
N = example_data$N,
model = "uncorrelated",
scaling = "all",
x = x,
z_global = z_global,
z_subpop = z_subpop,
chains = 1,
cores = 1,
warmup = 50,
iter = 100)
cov_corr_est = correlatedStan(example_data$ard,
known_sizes = example_data$subpop_sizes[c(1, 2, 4)],
known_ind = c(1, 2, 4),
N = example_data$N,
model = "correlated",
scaling = "all",
x = x,
z_subpop = z_subpop,
chains = 1,
cores = 1,
warmup = 50,
iter = 100)
# Compare size estimates
round(data.frame(true = example_data$subpop_sizes,
corr_basic = colMeans(basic_corr_est$sizes),
uncorr_x_zsubpop_zglobal = colMeans(cov_uncorr_est$sizes),
corr_x_zsubpop = colMeans(cov_corr_est$sizes)))
# Look at z slope parameters
colMeans(cov_uncorr_est$beta_global)
colMeans(cov_corr_est$beta_subpop)
colMeans(cov_uncorr_est$beta_subpop)
# Look at x slope parameters
colMeans(cov_uncorr_est$alpha)
colMeans(cov_corr_est$alpha)
## End(Not run)