ensemble.dummy.variables {BiodiversityR}R Documentation

Suitability mapping based on ensembles of modelling algorithms: handling of categorical data

Description

The basic function ensemble.dummy.variables creates new raster layers representing dummy variables (coded 0 or 1) for all or the most frequent levels of a caterogical variable. Sometimes the creation of dummy variables is needed for proper handling of categorical data for some of the suitability modelling algorithms.

Usage

ensemble.dummy.variables(xcat = NULL, 
    freq.min = 50, most.frequent = 5,
    new.levels = NULL, overwrite = TRUE, ...)

ensemble.accepted.categories(xcat = NULL, categories = NULL, 
    filename = NULL, overwrite = TRUE, ...)

ensemble.simplified.categories(xcat = NULL, p = NULL, 
    filename = NULL, overwrite = TRUE, ...)

Arguments

xcat

RasterLayer object (raster) containing values for a categorical explanatory variable.

freq.min

Minimum frequency for a dummy raster layer to be created for the corresponding factor level. See also freq.

most.frequent

Number of dummy raster layers to be created (if larger than 0), corresponding to the same number of most frequent factor levels See also freq.

new.levels

character vector giving factor levels that are not encountered in xcat and for which dummy layers should be created (this could be useful in dealing with novel conditions)

overwrite

overwrite an existing file name with the same name (if TRUE). See also writeRaster.

...

additional arguments for writeRaster or (for ensemble.dummy.variables, writeRaster).

categories

numeric vector providing the accepted levels of a categorical raster layer; expected to correspond to the levels encountered during calibration

filename

name for the output file. See also writeRaster.

p

presence points that will be used for calibrating the suitability models, typically available in 2-column (x, y) or (lon, lat) dataframe; see also prepareData and extract

Details

The basic function ensemble.dummy.variables creates dummy variables from a RasterLayer object (see raster) that represents a categorical variable. With freq.min and most.frequent it is possible to limit the number of dummy variables that will be created. For example, most.frequent = 5 results in five dummy variables to be created.

Function ensemble.accepted.categories modifies the RasterLayer object (see raster) by replacing cell values for categories (levels) that are not accepted with missing values.

Function ensemble.simplified.categories modifies the RasterLayer object (see raster) by replacing cell values for categories (levels) where none of the presence points occur with the same level. This new level is coded by the maximum coding level for these 'outside categories'.

Value

The basic function ensemble.dummy.variables mainly results in the creation of raster layers that correspond to dummy variables.

Author(s)

Roeland Kindt (World Agroforestry Centre) and Evert Thomas (Bioversity International)

See Also

ensemble.calibrate.models, ensemble.raster

Examples

## Not run: 

# get predictor variables
library(dismo)
predictor.files <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''),
    pattern='grd', full.names=TRUE)
predictors <- stack(predictor.files)
biome.layer <- predictors[["biome"]]
biome.layer

# create dummy layers for the 5 most frequent factor levels

ensemble.dummy.variables(xcat=biome.layer, most.frequent=5,
    overwrite=TRUE)

# check whether dummy variables were created
predictor.files <- list.files(path=paste(system.file(package="dismo"), '/ex', sep=''),
    pattern='grd', full.names=TRUE)
predictors <- stack(predictor.files)
predictors
names(predictors)

# once dummy variables were created, avoid using the original categorical data layer
predictors <- subset(predictors, subset=c("bio5", "bio6", "bio16", "bio17", 
    "biome_1", "biome_2", "biome_7", "biome_8", "biome_13"))
predictors
predictors@title <- "base"

# presence points
presence_file <- paste(system.file(package="dismo"), '/ex/bradypus.csv', sep='')
pres <- read.table(presence_file, header=TRUE, sep=',')[,-1]

# the kfold function randomly assigns data to groups; 
# groups are used as calibration (1/5) and training (4/5) data
groupp <- kfold(pres, 5)
pres_train <- pres[groupp !=  1, ]
pres_test <- pres[groupp ==  1, ]

# choose background points
background <- randomPoints(predictors, n=1000, extf=1.00)
colnames(background)=c('lon', 'lat')
groupa <- kfold(background, 5)
backg_train <- background[groupa != 1, ]
backg_test <- background[groupa == 1, ]

# note that dummy variables with no variation are not used by DOMAIN
# note that dummy variables are not used by MAHAL and MAHAL01
# (neither are categorical variables)
ensemble.nofactors <- ensemble.calibrate.models(x=predictors, p=pres_train, a=backg_train, 
    pt=pres_test, at=backg_test,
    species.name="Bradypus",
    VIF=T,
    MAXENT=1, MAXLIKE=1, GBM=1, GBMSTEP=0, RF=1, GLM=1, GLMSTEP=0, GAM=1, 
    GAMSTEP=0, MGCV=1, MGCVFIX=0, EARTH=1, RPART=1, NNET=1, FDA=1, 
    SVM=1, SVME=1, BIOCLIM.O=1, BIOCLIM=1, DOMAIN=1, MAHAL=0, MAHAL01=1,
    Yweights="BIOMOD", 
    dummy.vars=c("biome_1", "biome_2", "biome_7", "biome_8", "biome_13"),
    PLOTS=FALSE, evaluations.keep=TRUE)

## End(Not run)


[Package BiodiversityR version 2.16-1 Index]