impute.visibility {sspse}R Documentation

Estimates each person's personal visibility based on their self-reported degree and the number of their (direct) recruits. It uses the time the person was recruited as a factor in determining the number of recruits they produce.

Description

Estimates each person's personal visibility based on their self-reported degree and the number of their (direct) recruits. It uses the time the person was recruited as a factor in determining the number of recruits they produce.

Usage

impute.visibility(
  rds.data,
  max.coupons = NULL,
  type.impute = c("median", "distribution", "mode", "mean"),
  recruit.time = NULL,
  include.tree = FALSE,
  reflect.time = FALSE,
  parallel = 1,
  parallel.type = "PSOCK",
  interval = 10,
  burnin = 5000,
  mem.optimism.prior = NULL,
  df.mem.optimism.prior = 5,
  mem.scale.prior = 2,
  df.mem.scale.prior = 10,
  mem.overdispersion = 15,
  return.posterior.sample.visibilities = FALSE,
  verbose = FALSE
)

Arguments

rds.data

An rds.data.frame

max.coupons

The number of recruitment coupons distributed to each enrolled subject (i.e. the maximum number of recruitees for any subject). By default it is taken by the attribute or data, else the maximum recorded number of coupons.

type.impute

The type of imputation based on the conditional distribution. It can be of type distribution,mode,median, or mean with the first , the default, being a random draw from the conditional distribution.

recruit.time

vector; An optional value for the data/time that the person was interviewed. It needs to resolve as a numeric vector with number of elements the number of rows of the data with non-missing values of the network variable. If it is a character name of a variable in the data then that variable is used. If it is NULL then the sequence number of the recruit in the data is used. If it is NA then the recruitment is not used in the model. Otherwise, the recruitment time is used in the model to better predict the visibility of the person.

include.tree

logical; If TRUE, augment the reported network size by the number of recruits and one for the recruiter (if any). This reflects a more accurate value for the visibility, but is not the self-reported degree. In particular, it typically produces a positive visibility (compared to a possibility zero self-reported degree).

reflect.time

logical; If FALSE then the recruit.time is the time before the end of the study (instead of the time since the survey started or chronological time).

parallel

count; the number of parallel processes to run for the Monte-Carlo sample. This uses MPI or PSOCK. The default is 1, that is not to use parallel processing.

parallel.type

The type of parallel processing to use. The options are "PSOCK" or "MPI". This requires the corresponding type to be installed. The default is "PSOCK".

interval

count; the number of proposals between sampled statistics.

burnin

count; the number of proposals before any MCMC sampling is done. It typically is set to a fairly large number.

mem.optimism.prior

scalar; A hyper parameter being the mean of the distribution of the optimism parameter.

df.mem.optimism.prior

scalar; A hyper parameter being the degrees-of-freedom of the prior for the optimism parameter. This gives the equivalent sample size that would contain the same amount of information inherent in the prior.

mem.scale.prior

scalar; A hyper parameter being the scale of the concentration of baseline negative binomial measurement error model.

df.mem.scale.prior

scalar; A hyper parameter being the degrees-of-freedom of the prior for the standard deviation of the dispersion parameter in the visibility model. This gives the equivalent sample size that would contain the same amount of information inherent in the prior for the standard deviation.

mem.overdispersion

scalar; A parameter being the overdispersion of the negative binomial distribution that is the baseline for the measurement error model.

return.posterior.sample.visibilities

logical; If TRUE then return a matrix of dimension samplesize by n of posterior draws from the visibility distribution for those in the survey. The sample for the ith person is the ith column. The default is FALSE so that the vector of imputes defined by type.impute is returned.

verbose

logical; if this is TRUE, the program will print out additional

References

McLaughlin, Katherine R.; Johnston, Lisa G.; Jakupi, Xhevat; Gexha-Bunjaku, Dafina; Deva, Edona and Handcock, Mark S. (2023) Modeling the Visibility Distribution for Respondent-Driven Sampling with Application to Population Size Estimation, Annals of Applied Statistics, doi:10.1093/jrsssa/qnad031

Examples

## Not run: 
data(fauxmadrona)
# The next line fits the model for the self-reported personal
# network sizes and imputes the personal network sizes 
# It may take up to 60 seconds.
visibility <- impute.visibility(fauxmadrona)
# frequency of estimated personal visibility
table(visibility)

## End(Not run)

[Package sspse version 1.1.0-1 Index]