R: creating sdmSetting object

sdmSetting {sdm}

R Documentation

creating sdmSetting object

Description

Creates sdmSetting object that holds settings to fit and evaluate the models. It can be used to reproduce a study.

Usage

sdmSetting(formula,data,methods,interaction.depth=1,n=1,replication=NULL,cv.folds=NULL,
     test.percent=NULL,bg=NULL,bg.n=NULL,var.importance=NULL,response.curve=TRUE,
     var.selection=FALSE,modelSettings=NULL,seed=NULL,parallelSetting=NULL,...)

Arguments

`formula`	specify the structure of the model
`data`	sdm data object or data.frame including species and feature data
`methods`	character, name of the algorithms
`interaction.depth`	level of interactions between predictors
`n`	number of replicates (run)
`replication`	replication method (e.g., 'subsampling', 'bootstrapping', 'cv')
`cv.folds`	number of folds if cv (cross-validation) is in the selected replication methods
`test.percent`	test percentage if subsampling is in the selected replication methods
`bg`	method to generate background
`bg.n`	number of background records
`var.importance`	logical, whether variable importance should be calculated
`response.curve`	method to calculate variable importance
`var.selection`	logical, whether variable selection should be considered
`modelSettings`	optional list; settings for modelling methods can be specified by users
`seed`	default is NULL; either logical specify whether a seed for random number generator should be considered, or a numerical to specify the exact seed number
`parallelSetting`	default is NULL; a list include setting items for parallel processing. The items in parallel setting include: ncore, method, type, hosts, doParallel, and fork; see details for more information.
`...`	additional arguments

Details

using sdmSetting, the feature types, interaction.depth and all settings of the model can be defined. This function generate a sdmSetting object that can be specifically helpful for reproducibility. The object can be shared by a user that may be used for other studies.

If a user aims to reproduce the same results for every time the code is running with the same data and settings, a seed number should be specified. Through the seed argument, a user can specify NULL, means a seed should not be set (if a random sampling is incorporated in the modelling procedure, for different runs the results would be different); TRUE, means a seed should be set (the seed number is randomly selected and used everytime the same setting is incorporated); a number, means the seed will be set to the number specified by the user.

For parallel processing, a list of items can be passed to parallelSetting, including:

ncore: defines the number of cores (it can also be specified outside of this list

method: defines the parallelising engine. Currently, three options are available including 'parallel', 'foreach', and 'future'. default is 'parallel'

doParallel: Optional, definition to register for a backend for parallel processing (needed when method='foreach'). It should be provided as an R expression like the following example:

expression(registerDoParallel(parallelSetting@cl))

The above example is based on the function available in doParallel package. Other packages can also be used to provide and register backend technologies (e.g., doMC)

cluster: Optional; in case a cluster is created and available (e.g., using cl <- parallel::makeCluster(2)), the cluster object can be introduced here to be used as the parallel processing engine, otherwise, it is handled by the sdm package.

hosts: Optional; To use remote machines/clusters in the parallel processing, a character vector with the addresses (names or IPs) of the accessible (on the network) remote clusters can be provided here to be registered and used in parallel processing (still under development so it may not work appropriately!)

fork: Logical, Available for non-windows operating system and specifies whether a fork solution should be used for the parallelisation. Default is TRUE for non-windows OS and FALSE for windows.

NOTE: Only use parallelSetting when you deal with a big dataset or large number of models otherwise, it make the procedure slower rather than faster if the procedure is quick without parallelising!

Value

an object of class sdmSettings

Author(s)

Babak Naimi naimi.b@gmail.com

https://www.r-gis.net/

https://www.biogeoinformatics.org/

References

Naimi, B., Araujo, M.B. (2016) sdm: a reproducible and extensible R platform for species distribution modelling, Ecography, DOI: 10.1111/ecog.01881

Examples

## Not run: 
file <- system.file("external/pa_df.csv", package="sdm")

df <- read.csv(file)

head(df) 

d <- sdmData(sp~b15+NDVI,train=df)

# generate sdmSettings object:
s <- sdmSetting(sp~., methods=c('glm','gam','brt','svm','rf'),
          replication='sub',test.percent=30,n=10,modelSettings=list(brt=list(n.trees=500)))

s



## End(Not run)

[Package sdm version 1.2-46 Index]