evolve_model_ntimes {datafsm}R Documentation

Use a Genetic Algorithm to Estimate a Finite-state Machine Model n-times

Description

evolve_model uses a genetic algorithm to estimate a finite-state machine model, primarily for understanding and predicting decision-making.

Usage

evolve_model_ntimes(data, test_data = NULL, drop_nzv = FALSE,
        measure = c("accuracy", "sens", "spec", "ppv"),
        states = NULL, cv = FALSE, max_states = NULL, k = 2,
        actions = NULL, seed = NULL, popSize = 75,
        pcrossover = 0.8, pmutation = 0.1, maxiter = 50,
        run = 25, parallel = FALSE, priors = NULL,
        verbose = TRUE, return_best = TRUE, ntimes = 10,
        cores = NULL)

Arguments

data

A data.frame that has columns named "period" and "outcome" (period is the time period that the outcome action was taken), and one to three additional columns, containing predictors. All of the 3-5 columns should be named. The period and outcome columns should be integer vectors and the columns with the predictor variable data should be logical vectors (TRUE, FALSE). If the predictor variable data is not logical, it will coerced to logical with
base::as.logical().

test_data

Optional data.frame that has "period" and "outcome" columns, with one to three additional columns containing predictors. All of the (3-5 columns) should be named. The outcome variable is the decision the decision-maker took for that period. This data.frame should be in the same format and have the same order of columns as the data.frame passed to the required data argument.

drop_nzv

Optional logical vector length one specifying whether predictors variables with variance in provided data near zero should be dropped before model building. Default is FALSE. See caret::nearZeroVar(), which calls:
caret::nzv().

measure

Optional length one character vector that is either: "accuracy", "sens", "spec", or "ppv". This specifies what measure of predictive performance to use for training and evaluating the model. The default measure is "accuracy". However, accuracy can be a problematic measure when the classes are imbalanced in the samples, i.e. if a class the model is trying to predict is very rare. Alternatives to accuracy are available that illuminate different aspects of predictive power. Sensitivity answers the question, “ given that a result is truly an event, what is the probability that the model will predict an event?” Specificity answers the question, “given that a result is truly not an event, what is the probability that the model will predict a negative?” Positive predictive value answers, “what is the percent of predicted positives that are actually positive?”

states

Optional numeric vector with the number of states. If not provided, will be set to max(data$outcome).

cv

Optional logical vector length one for whether cross-validation should be conducted on training data to select optimal number of states. This can drastically increase computation time because if TRUE, it will run evolve_model k*max_states times to estimate optimal value for states. Ties are broken by choosing the smaller number of states. Default is FALSE.

max_states

Optional numeric vector length one only relevant if cv==TRUE. It specifies how up to how many states that cross-validation should search through. If not provided, will be set to states + 1.

k

Optional numeric vector length one only relevant if cv==TRUE, specifying number of folds for cross-validation.

actions

Optional numeric vector with the number of actions. If not provided, then actions will be set as the number of unique values in the outcome vector.

seed

Optional numeric vector length one.

popSize

Optional numeric vector length one specifying the size of the GA population. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package.

pcrossover

Optional numeric vector length one specifying probability of crossover for GA. This is passed to the GA::ga() function of the GA package.

pmutation

Optional numeric vector length one specifying probability of mutation for GA. This is passed to the GA::ga() function of the GA package.

maxiter

Optional numeric vector length one specifying max number of iterations for stopping the GA evolution. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package. maxiter is scaled by how many parameters are in the model:
maxiter <- maxiter + ((maxiter*(nBits^2)) / maxiter).

run

Optional numeric vector length one specifying max number of consecutive iterations without improvement in best fitness score for stopping the GA evolution. A larger number will increase the probability of finding a very good solution but will also increase the computation time. This is passed to the GA::ga() function of the GA package.

parallel

Optional logical vector length one. For running the GA evolution in parallel. Depending on the number of cores registered and the memory on your machine, this can make the process much faster, but only works for Unix-based machines that can fork the processes.

priors

Optional numeric matrix of solutions strings to be included in the initialization. User needs to use a decoder function to translate prior decision models into bits and then provide them. If this is not specified, then random priors are automatically created.

verbose

Optional logical vector length one specifying whether helpful messages should be displayed on the user's console or not.

return_best

Optional logical vector length one specifying whether to return just the best model or all models. Only relevant if ntimes > 1. Default is TRUE.

ntimes

Optional integer vector length one specifying the number of times to estimate model. Default is 1 time.

cores

integer vector length one specifying number of cores to use if parallel is TRUE.

Details

This function of the datafsm package applies the evolve_model function multiple times and then returns a list with either all the models or the best one.

evolve_model uses a stochastic meta-heuristic optimization routine to estimate the parameters that define a FSM model. Because this is not guaranteed to return the best result, we run it many times.

Value

Returns a list where each element is an S4 object of class ga_fsm. See ga_fsm for the details of the slots (objects) that this type of object will have and for information on the methods that can be used to summarize the calling and execution of evolve_model(), including summary, print, and plot.

Examples

## Not run: 
# Create data:
cdata <- data.frame(period = rep(1:10, 1000),
                   outcome = rep(1:2, 5000),
                   my.decision1 = sample(1:0, 10000, TRUE),
                   other.decision1 = sample(1:0, 10000, TRUE))
(res <- evolve_model_ntimes(cdata, ntimes=2))
(res <- evolve_model_ntimes(cdata, return_best = FALSE, ntimes=2))

## End(Not run)


[Package datafsm version 0.2.4 Index]