Orders the levels by their similarity in other features. Computes per feature
the distance, sums up all distances and does multi-dimensional scaling

`order_levels(dat, feature.name)`

## Arguments

- dat
data.frame with the training data

- feature.name
the name of the categorical feature

## Value

the order of the levels (not levels itself)

## Details

Goal: Compute the distances between two categories.
Input: Instances from category 1 and 2

For all features, do (excluding the categorical feature for which we are computing the order):

If the feature is numerical: Take instances from category 1, calculate the
empirical cumulative probability distribution function (ecdf) of the
feature. The ecdf is a function that tells us for a given feature value, how
many values are smaller. Do the same for category 2. The distance is the
absolute maximum point-wise distance of the two ecdf. Practically, this
value is high when the distribution from one category is strongly shifted
far away from the other. This measure is also known as the
Kolmogorov-Smirnov distance
(https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test).

If the feature is categorical: Take instances from category 1 and
calculate a table with the relative frequency of each category of the other
feature. Do the same for instances from category 2. The distance is the sum
of the absolute difference of both relative frequency tables.

Sum up the distances over all features

This algorithm we run for all pairs of categories.
Then we have a k times k matrix, when k is the number of categories, where
each entry is the distance between two categories. Still not enough to have a
single order, because, a (dis)similarity tells you the pair-wise distances,
but does not give you a one-dimensional ordering of the classes. To kind of
force this thing into a single dimension, we have to use a dimension
reduction trick called multi-dimensional scaling. This can be solved using
multi-dimensional scaling, which takes in a distance matrix and returns a
distance matrix with reduced dimension. In our case, we only want 1 dimension
left, so that we have a single ordering of the categories and can compute the
accumulated local effects. After reducing it to a single ordering, we are
done and can use this ordering to compute ALE. This is not the Holy Grail how
to order the factors, but one possibility.