order_levels {iml}R Documentation

Order levels of a categorical features

Description

Orders the levels by their similarity in other features. Computes per feature the distance, sums up all distances and does multi-dimensional scaling

Usage

order_levels(dat, feature.name)

Arguments

dat

data.frame with the training data

feature.name

the name of the categorical feature

Details

Goal: Compute the distances between two categories. Input: Instances from category 1 and 2

  1. For all features, do (excluding the categorical feature for which we are computing the order):

  1. Sum up the distances over all features

This algorithm we run for all pairs of categories. Then we have a k times k matrix, when k is the number of categories, where each entry is the distance between two categories. Still not enough to have a single order, because, a (dis)similarity tells you the pair-wise distances, but does not give you a one-dimensional ordering of the classes. To kind of force this thing into a single dimension, we have to use a dimension reduction trick called multi-dimensional scaling. This can be solved using multi-dimensional scaling, which takes in a distance matrix and returns a distance matrix with reduced dimension. In our case, we only want 1 dimension left, so that we have a single ordering of the categories and can compute the accumulated local effects. After reducing it to a single ordering, we are done and can use this ordering to compute ALE. This is not the Holy Grail how to order the factors, but one possibility.

Value

the order of the levels (not levels itself)


[Package iml version 0.11.3 Index]