assess_reference_loo {rubias} | R Documentation |
Simulate mixtures and estimate reporting group and collection proportions.
Description
From a reference dataset, this creates a genotype-logL matrix based on simulation-by-individual with randomly drawn population proportions, then uses this in two different estimates of population mixture proportions: maximum likelihood via EM-algorithm and posterior mean from MCMC.
Usage
assess_reference_loo(
reference,
gen_start_col,
reps = 50,
mixsize = 100,
seed = 5,
alpha_repunit = 1.5,
alpha_collection = 1.5,
resampling_unit = "individual",
alle_freq_prior = list(const_scaled = 1),
printSummary = FALSE,
return_indiv_posteriors = FALSE
)
Arguments
reference |
a two-column format genetic dataset, with "repunit", "collection", and "indiv" columns, as well as a "sample_type" column that has some "reference" entries |
gen_start_col |
the first column of genetic data in |
reps |
number of reps of mixture simulation and MCMC to do |
mixsize |
the number of individuals in each simulated mixture |
seed |
a random seed for simulations |
alpha_repunit |
If a vector, this is the dirichlet parameter for simulating
the proportions of reporting units. Gets recycled to the number of reporting units. Default is 1.5.
Otherwise, this could be a two-column data frame. The first column must be named "repunit" and the
second one must be one of "dirichlet", "ppn", or "cnt", according to whether you wish to
specify dirichlet parameters, or proportions, or exact counts, respectively, for each population.
If you want to make multiple simulations, pass in a list of data frames or of individual dirichlet parameters.
For examples, see |
alpha_collection |
The dirichlet parameter for simulating proportions of collections within reporting units. Default = 1.5.
If this is a data frame then the first column must be "collection" and the second must be one of
"dirichlet", "ppn", "cnt", "sub_dirichlet", "sub_ppn". If you want to provide multiple different
scenarios. You can pass them in as a list. If alpha_repunit or alpha_collection is a list with length
greater than 1, the shorter will be recycled.
For examples, see |
resampling_unit |
what unit should be resampled. Currently the choices are "individuals" (the default) and "gene_copies". Using "individuals" preserves missing data patterns available in the reference data set. We also have "gene_copies_with_missing" capability, but it is not yet linked into this function. |
alle_freq_prior |
a one-element named list specifying the prior to be used when
generating Dirichlet parameters for genotype likelihood calculations. Valid methods include
|
printSummary |
if TRUE a summary of the reference samples will be printed to stdout. |
return_indiv_posteriors |
if TRUE, output is a list of 2. The first entry, |
Examples
# very small number of reps so it is quick enough for example
ale_dev <- assess_reference_loo(alewife, 17, reps = 5)