calculate_model_drift {drifter}R Documentation

Calculate Model Drift for comparison of models trained on new/old data

Description

This function calculates differences between PDP curves calculated for new/old models

Usage

calculate_model_drift(model_old, model_new, data_new, y_new,
  predict_function = predict, max_obs = 100, scale = sd(y_new, na.rm
  = TRUE))

Arguments

model_old

model created on historical / 'old'data

model_new

model created on current / 'new'data

data_new

data frame with current / 'new' data

y_new

true values of target variable for current / 'new' data

predict_function

function that takes two arguments: model and new data and returns numeric vector with predictions, by default it's 'predict'

max_obs

if negative, them all observations are used for calculation of PDP, is positive, then only 'max_obs' are used for calculation of PDP

scale

scale parameter for calculation of scaled drift

Value

an object of a class 'model_drift' (data.frame) with distances calculated based on Partial Dependency Plots

Examples

 library("DALEX")
 model_old <- lm(m2.price ~ ., data = apartments)
 model_new <- lm(m2.price ~ ., data = apartments_test[1:1000,])
 calculate_model_drift(model_old, model_new,
                  apartments_test[1:1000,],
                  apartments_test[1:1000,]$m2.price)

 
 library("ranger")
 predict_function <- function(m,x,...) predict(m, x, ...)$predictions
 model_old <- ranger(m2.price ~ ., data = apartments)
 model_new <- ranger(m2.price ~ ., data = apartments_test)
 calculate_model_drift(model_old, model_new,
                  apartments_test,
                  apartments_test$m2.price,
                  predict_function = predict_function)

 # here we compare model created on male data
 # with model applied to female data
 # there is interaction with age, and it is detected here
 predict_function <- function(m,x,...) predict(m, x, ..., probability=TRUE)$predictions[,1]
 data_old = HR[HR$gender == "male", -1]
 data_new = HR[HR$gender == "female", -1]
 model_old <- ranger(status ~ ., data = data_old, probability=TRUE)
 model_new <- ranger(status ~ ., data = data_new, probability=TRUE)
 calculate_model_drift(model_old, model_new,
                  HR_test,
                  HR_test$status == "fired",
                  predict_function = predict_function)

 # plot it
 library("ingredients")
 prof_old <- partial_dependency(model_old,
                                     data = data_new[1:500,],
                                     label = "model_old",
                                     predict_function = predict_function,
                                     grid_points = 101,
                                     variable_splits = NULL)
 prof_new <- partial_dependency(model_new,
                                     data = data_new[1:500,],
                                     label = "model_new",
                                     predict_function = predict_function,
                                     grid_points = 101,
                                     variable_splits = NULL)
 plot(prof_old, prof_new, color = "_label_")



[Package drifter version 0.2.1 Index]