abcrf {abcrf}R Documentation

Create an ABC-RF object: a classification random forest from a reference table towards performing an ABC model choice

Description

abcrf constructs a random forest from a reference table towards performing an ABC model choice. Basically, the reference table (i.e. the dataset that will be treated with the present package) includes a column with the index of the models to be compared and additional columns corresponding to the values of the simulated summary statistics.

Usage

## S3 method for class 'formula'
abcrf(formula, data, group=list(), lda=TRUE, ntree=500, sampsize=min(1e5, nrow(data)),
paral=FALSE, ncores= if(paral) max(detectCores()-1,1) else 1, ...)

Arguments

formula

a formula: left of ~, variable representing the model index; right of ~, summary statistics of the reference table.

data

a data frame containing the reference table.

group

a list containing groups (at least 2) of model(s) on which the model choice will be performed. This is not necessarily a partition, one or more models can be excluded from the elements of the list and by default no grouping is done.

lda

should LDA scores be added to the list of summary statistics?

ntree

number of trees to grow in the forest, by default 500 trees.

sampsize

size of the sample from the reference table to grow a tree of the classification forest, by default the minimum between the number of elements of the reference table and 100,000.

paral

a boolean that indicates if the calculations of the classification random forest (forest used to assign a model to the observed dataset) should be parallelized.

ncores

the number of CPU cores to use. If paral=TRUE, it is used the number of CPU cores minus 1. If ncores is not specified and detectCores does not detect the number of CPU cores with success then 1 core is used.

...

additional arguments to be passed on to ranger used to construct the classification random forest that preditcs the selected model.

Value

An object of class abcrf, which is a list with the following components:

call

the original call to abcrf,

lda

a boolean indicating if LDA scores have been added to the list of summary statistics,

formula

the formula used to construct the classification random forest,

group

a list contining the groups of model(s) used. This list is empty if no grouping has been performed,

model.rf

an object of class randomForest containing the trained forest with the reference table,

model.lda

an object of class lda containing the Linear Discriminant Analysis based on the reference table,

prior.err

prior error rates of model selection on the reference table, estimated with the "out-of-bag" error of the forest.

References

Pudlo P., Marin J.-M., Estoup A., Cornuet J.-M., Gautier M. and Robert, C. P. (2016) Reliable ABC model choice via random forests Bioinformatics doi:10.1093/bioinformatics/btv684

Estoup A., Raynal L., Verdu P. and Marin J.-M. (2018) Model choice using Approximate Bayesian Computation and Random Forests: analyses based on model grouping to make inferences about the genetic history of Pygmy human populations Jounal de la Société Française de Statistique http://journal-sfds.fr/article/view/709

See Also

plot.abcrf, predict.abcrf, err.abcrf, ranger

Examples

data(snp)
modindex <- snp$modindex[1:500]
sumsta <- snp$sumsta[1:500,]
data1 <- data.frame(modindex, sumsta)
model.rf1 <- abcrf(modindex~., data = data1, ntree=100)
model.rf1
model.rf2 <- abcrf(modindex~., data = data1, group = list(c("1","2"),"3"), ntree=100)
model.rf2

[Package abcrf version 1.9 Index]