assess_reference_loo {rubias}R Documentation

Simulate mixtures and estimate reporting group and collection proportions.

Description

From a reference dataset, this creates a genotype-logL matrix based on simulation-by-individual with randomly drawn population proportions, then uses this in two different estimates of population mixture proportions: maximum likelihood via EM-algorithm and posterior mean from MCMC.

Usage

assess_reference_loo(
  reference,
  gen_start_col,
  reps = 50,
  mixsize = 100,
  seed = 5,
  alpha_repunit = 1.5,
  alpha_collection = 1.5,
  resampling_unit = "individual",
  alle_freq_prior = list(const_scaled = 1),
  printSummary = FALSE,
  return_indiv_posteriors = FALSE
)

Arguments

reference

a two-column format genetic dataset, with "repunit", "collection", and "indiv" columns, as well as a "sample_type" column that has some "reference" entries

gen_start_col

the first column of genetic data in reference

reps

number of reps of mixture simulation and MCMC to do

mixsize

the number of individuals in each simulated mixture

seed

a random seed for simulations

alpha_repunit

If a vector, this is the dirichlet parameter for simulating the proportions of reporting units. Gets recycled to the number of reporting units. Default is 1.5. Otherwise, this could be a two-column data frame. The first column must be named "repunit" and the second one must be one of "dirichlet", "ppn", or "cnt", according to whether you wish to specify dirichlet parameters, or proportions, or exact counts, respectively, for each population. If you want to make multiple simulations, pass in a list of data frames or of individual dirichlet parameters. For examples, see sim_spec_examples.

alpha_collection

The dirichlet parameter for simulating proportions of collections within reporting units. Default = 1.5. If this is a data frame then the first column must be "collection" and the second must be one of "dirichlet", "ppn", "cnt", "sub_dirichlet", "sub_ppn". If you want to provide multiple different scenarios. You can pass them in as a list. If alpha_repunit or alpha_collection is a list with length greater than 1, the shorter will be recycled. For examples, see sim_spec_examples.

resampling_unit

what unit should be resampled. Currently the choices are "individuals" (the default) and "gene_copies". Using "individuals" preserves missing data patterns available in the reference data set. We also have "gene_copies_with_missing" capability, but it is not yet linked into this function.

alle_freq_prior

a one-element named list specifying the prior to be used when generating Dirichlet parameters for genotype likelihood calculations. Valid methods include "const", "scaled_const", and "empirical". See ?list_diploid_params for method details.

printSummary

if TRUE a summary of the reference samples will be printed to stdout.

return_indiv_posteriors

if TRUE, output is a list of 2. The first entry, mixing_proportions, contains the true (simulated) and estimated mixture proportions for each scenario, iteration, and collection. The second, indiv_posteriors, contains the posterior probability of assignment to each collection for each scenario, iteration, and individual. If FALSE, output is a single data frame, mixing_proportions

Examples

# very small number of reps so it is quick enough for example
ale_dev <- assess_reference_loo(alewife, 17, reps = 5)


[Package rubias version 0.3.4 Index]