# mlr3tuning Tutorial - German Credit

In this use case, we continue working with the German credit dataset. We work on hyperparameter tuning and apply nested resampling.

Martin Binder , Florian Pfisterer
03-11-2020

## Intro

This is the second part of a serial of tutorials. The other parts of this series can be found here:

We will continue working with the German credit dataset. In Part I, we peeked into the dataset by using and comparing some learners with their default parameters. We will now see how to:

• Tune hyperparameters for a given problem
• Perform nested resampling

## Prerequisites

First, load the packages we are going to use:


library("data.table")
library("ggplot2")
library("mlr3")
library("mlr3learners")
library("mlr3tuning")
library("paradox")

We use the same Task as in Part I:


task = tsk("german_credit")

We also might want to use multiple cores to reduce long run times of tuning runs.


# future::plan("multiprocess") # uncomment for parallelization

### Evaluation

We will evaluate all hyperparameter configurations using 10-fold CV. We use a fixed train-test split, i.e. the same splits for each evaluation. Otherwise, some evaluation could get unusually “hard” splits, which would make comparisons unfair.


set.seed(8008135)
cv10_instance = rsmp("cv", folds = 10)

# fix the train-test splits using the $instantiate() method cv10_instance$instantiate(task)

# have a look at the test set instances per fold
cv10_instance$instance  row_id fold 1: 5 1 2: 20 1 3: 28 1 4: 35 1 5: 37 1 --- 996: 936 10 997: 950 10 998: 963 10 999: 985 10 1000: 994 10 ## Simple Parameter Tuning Parameter tuning in mlr3 needs two packages: 1. The paradox package is used for the search space definition of the hyperparameters 2. The mlr3tuning package is used for tuning the hyperparameters ### Search Space and Problem Definition First, we need to decide what Learner we want to optimize. We will use LearnerClassifKKNN, the “kernelized” k-nearest neighbor classifier. We will use kknn as a normal kNN without weighting first (i.e., using the rectangular kernel):  knn = lrn("classif.kknn", predict_type = "prob") knn$param_set$values$kernel = "rectangular"

As a next step, we decide what parameters we optimize over. Before that, though, we are interested in the parameter set on which we could tune:


knn$param_set  <ParamSet> id class lower upper 1: k ParamInt 1 Inf 2: distance ParamDbl 0 Inf 3: kernel ParamFct NA NA 4: scale ParamLgl NA NA 5: ykernel ParamUty NA NA levels default 1: 7 2: 2 3: rectangular,triangular,epanechnikov,biweight,triweight,cos,... optimal 4: TRUE,FALSE TRUE 5: value 1: 2: 3: rectangular 4: 5:  We first tune the k parameter (i.e. the number of nearest neighbors), between 3 to 20. Second, we tune the distance function, allowing L1 and L2 distances. To do so, we use the paradox package to define a search space (see the online vignette for a more complete introduction.  search_space = ParamSet$new(list(
ParamInt$new("k", lower = 3, upper = 20), ParamInt$new("distance", lower = 1, upper = 2)
))

As a next step, we define a TuningInstanceSingleCrit that represents the problem we are trying to optimize.


instance_grid = TuningInstanceSingleCrit$new( task = task, learner = knn, resampling = cv10_instance, measure = msr("classif.ce"), search_space = search_space, terminator = trm("none") ) After having set up a tuning instance, we can start tuning. Before that, we need a tuning strategy, though. A simple tuning method is to try all possible combinations of parameters: Grid Search. While it is very intuitive and simple, it is inefficient if the search space is large. For this simple use case, it suffices, though. We get the grid_search tuner via:  set.seed(1) tuner_grid = tnr("grid_search", resolution = 18, batch_size = 36) Tuning works by calling $optimize(). Note that the tuning procedure modifies our tuning instance (as usual for R6 class objects). The result can be found in the instance object. Before tuning it is empty:


instance_grid$result  NULL Now, we tune:  tuner_grid$optimize(instance_grid)

k distance learner_param_vals  x_domain classif.ce
1: 9        2          <list[3]> <list[2]>       0.25

The result is returned by $optimize() together with its performance. It can be also accessed with the $result slot:


instance_grid$result  k distance learner_param_vals x_domain classif.ce 1: 9 2 <list[3]> <list[2]> 0.25 We can also look at the Archive of evaluated configurations:  instance_grid$archive$data()  k distance classif.ce resample_result x_domain timestamp 1: 3 1 0.271 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 2: 3 2 0.273 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 3: 4 1 0.292 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 4: 4 2 0.279 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 5: 5 1 0.271 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 6: 5 2 0.274 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 7: 6 1 0.278 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 8: 6 2 0.273 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 9: 7 1 0.257 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 10: 7 2 0.258 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 11: 8 1 0.264 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 12: 8 2 0.256 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 13: 9 1 0.251 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 14: 9 2 0.250 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 15: 10 1 0.261 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 16: 10 2 0.250 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 17: 11 1 0.256 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 18: 11 2 0.254 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 19: 12 1 0.260 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 20: 12 2 0.259 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 21: 13 1 0.268 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 22: 13 2 0.258 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 23: 14 1 0.265 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 24: 14 2 0.263 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 25: 15 1 0.268 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 26: 15 2 0.264 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 27: 16 1 0.267 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 28: 16 2 0.262 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 29: 17 1 0.264 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 30: 17 2 0.267 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 31: 18 1 0.273 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 32: 18 2 0.271 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 33: 19 1 0.269 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 34: 19 2 0.269 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 35: 20 1 0.268 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 36: 20 2 0.269 <ResampleResult[18]> <list[2]> 2020-08-08 04:48:45 k distance classif.ce resample_result x_domain timestamp batch_nr 1: 1 2: 1 3: 1 4: 1 5: 1 6: 1 7: 1 8: 1 9: 1 10: 1 11: 1 12: 1 13: 1 14: 1 15: 1 16: 1 17: 1 18: 1 19: 1 20: 1 21: 1 22: 1 23: 1 24: 1 25: 1 26: 1 27: 1 28: 1 29: 1 30: 1 31: 1 32: 1 33: 1 34: 1 35: 1 36: 1 batch_nr We plot the performances depending on the sampled k and distance:  ggplot(instance_grid$archive$data(), aes(x = k, y = classif.ce, color = as.factor(distance))) + geom_line() + geom_point(size = 3) On average, the Euclidean distance (distance = 2) seems to work better. However, there is much randomness introduced by the resampling instance. So you, the reader, may see a different result, when you run the experiment yourself and set a different random seed. For k, we find that values between 7 and 13 perform well. ### Random Search and Transformation Let’s have a look at a larger search space. For example, we could tune all available parameters and limit k to large values (50). We also now tune the distance param continuously from 1 to 3 as a double and tune distance kernel and whether we scale the features. We may find two problems when doing so: First, the resulting difference in performance between k = 3 and k = 4 is probably larger than the difference between k = 49 and k = 50. While 4 is 33% larger than 3, 50 is only 2 percent larger than 49. To account for this we will use a transformation function for k and optimize in log-space. We define the range for k from log(3) to log(50) and exponentiate in the transformation. Now, as k has become a double instead of an int (in the search space, before transformation), we aösp round it in the trafo.  large_searchspace = ParamSet$new(list(
ParamDbl$new("k", lower = log(3), upper = log(50)), ParamDbl$new("distance", lower = 1, upper = 3),
ParamFct$new("kernel", c("rectangular", "gaussian", "rank", "optimal")), ParamLgl$new("scale")
))

large_searchspace$trafo = function(x, param_set) { x$k = round(exp(x$k)) x } The second problem is that grid search may (and often will) take a long time. For instance, trying out three different values for k, distance, kernel, and the two values for scale will take 54 evaluations. Because of this, we use a different search algorithm, namely the Random Search. We need to specify in the tuning instance a termination criterion. The criterion tells the search algorithm when to stop. Here, we will terminate after 36 evaluations:  tuner_random = tnr("random_search", batch_size = 36) instance_random = TuningInstanceSingleCrit$new(
learner = knn,
resampling = cv10_instance,
measure = msr("classif.ce"),
search_space = large_searchspace,
terminator = trm("evals", n_evals = 36)
)

tuner_random$optimize(instance_random)  k distance kernel scale learner_param_vals x_domain classif.ce 1: 2.441256 1.650704 rank TRUE <list[4]> <list[4]> 0.246 Like before, we can review the Archive. It includes the points before and after the transformation. The archive includes a column for each parameter the Tuner sampled on the search space (points before the transformation):  instance_random$archive$data()  k distance kernel scale classif.ce resample_result 1: 2.919058 1.779979 gaussian FALSE 0.299 <ResampleResult[18]> 2: 3.301324 2.554641 rectangular FALSE 0.294 <ResampleResult[18]> 3: 2.654531 2.921236 rectangular FALSE 0.315 <ResampleResult[18]> 4: 2.588931 1.869319 rank TRUE 0.254 <ResampleResult[18]> 5: 3.319396 2.425029 rectangular TRUE 0.268 <ResampleResult[18]> 6: 1.164253 1.799989 gaussian FALSE 0.364 <ResampleResult[18]> 7: 2.441256 1.650704 rank TRUE 0.246 <ResampleResult[18]> 8: 3.158912 2.514174 optimal FALSE 0.305 <ResampleResult[18]> 9: 3.047551 1.405385 gaussian TRUE 0.257 <ResampleResult[18]> 10: 2.442352 2.422242 gaussian TRUE 0.270 <ResampleResult[18]> 11: 3.521548 1.243384 rectangular TRUE 0.266 <ResampleResult[18]> 12: 2.331159 1.490977 optimal TRUE 0.252 <ResampleResult[18]> 13: 1.787328 1.286609 gaussian FALSE 0.345 <ResampleResult[18]> 14: 1.297461 1.479259 rank TRUE 0.272 <ResampleResult[18]> 15: 1.378451 1.117869 rectangular FALSE 0.355 <ResampleResult[18]> 16: 1.988414 2.284577 rectangular FALSE 0.313 <ResampleResult[18]> 17: 2.557743 2.752538 rank TRUE 0.267 <ResampleResult[18]> 18: 2.961104 2.557829 rank TRUE 0.265 <ResampleResult[18]> 19: 2.243193 2.594618 rectangular TRUE 0.256 <ResampleResult[18]> 20: 3.666907 1.910549 rectangular FALSE 0.292 <ResampleResult[18]> 21: 1.924639 1.820168 rank TRUE 0.255 <ResampleResult[18]> 22: 2.390153 2.621740 optimal FALSE 0.339 <ResampleResult[18]> 23: 2.033775 2.209867 rank FALSE 0.342 <ResampleResult[18]> 24: 2.929778 2.309448 rank FALSE 0.303 <ResampleResult[18]> 25: 1.824519 1.706395 rank TRUE 0.261 <ResampleResult[18]> 26: 2.444957 1.540520 optimal TRUE 0.249 <ResampleResult[18]> 27: 3.254559 2.985368 rank FALSE 0.302 <ResampleResult[18]> 28: 1.335633 2.266987 rank FALSE 0.362 <ResampleResult[18]> 29: 3.561251 1.426416 rank FALSE 0.297 <ResampleResult[18]> 30: 2.052564 1.258745 rectangular FALSE 0.320 <ResampleResult[18]> 31: 3.460303 1.956236 gaussian FALSE 0.296 <ResampleResult[18]> 32: 2.073975 2.848149 rank TRUE 0.269 <ResampleResult[18]> 33: 2.037658 2.197522 gaussian TRUE 0.267 <ResampleResult[18]> 34: 2.438784 2.952341 rectangular FALSE 0.308 <ResampleResult[18]> 35: 3.608733 2.463585 rank FALSE 0.294 <ResampleResult[18]> 36: 3.530354 1.713454 rectangular FALSE 0.294 <ResampleResult[18]> k distance kernel scale classif.ce resample_result x_domain timestamp batch_nr 1: <list[4]> 2020-08-08 04:49:20 1 2: <list[4]> 2020-08-08 04:49:20 1 3: <list[4]> 2020-08-08 04:49:20 1 4: <list[4]> 2020-08-08 04:49:20 1 5: <list[4]> 2020-08-08 04:49:20 1 6: <list[4]> 2020-08-08 04:49:20 1 7: <list[4]> 2020-08-08 04:49:20 1 8: <list[4]> 2020-08-08 04:49:20 1 9: <list[4]> 2020-08-08 04:49:20 1 10: <list[4]> 2020-08-08 04:49:20 1 11: <list[4]> 2020-08-08 04:49:20 1 12: <list[4]> 2020-08-08 04:49:20 1 13: <list[4]> 2020-08-08 04:49:20 1 14: <list[4]> 2020-08-08 04:49:20 1 15: <list[4]> 2020-08-08 04:49:20 1 16: <list[4]> 2020-08-08 04:49:20 1 17: <list[4]> 2020-08-08 04:49:20 1 18: <list[4]> 2020-08-08 04:49:20 1 19: <list[4]> 2020-08-08 04:49:20 1 20: <list[4]> 2020-08-08 04:49:20 1 21: <list[4]> 2020-08-08 04:49:20 1 22: <list[4]> 2020-08-08 04:49:20 1 23: <list[4]> 2020-08-08 04:49:20 1 24: <list[4]> 2020-08-08 04:49:20 1 25: <list[4]> 2020-08-08 04:49:20 1 26: <list[4]> 2020-08-08 04:49:20 1 27: <list[4]> 2020-08-08 04:49:20 1 28: <list[4]> 2020-08-08 04:49:20 1 29: <list[4]> 2020-08-08 04:49:20 1 30: <list[4]> 2020-08-08 04:49:20 1 31: <list[4]> 2020-08-08 04:49:20 1 32: <list[4]> 2020-08-08 04:49:20 1 33: <list[4]> 2020-08-08 04:49:20 1 34: <list[4]> 2020-08-08 04:49:20 1 35: <list[4]> 2020-08-08 04:49:20 1 36: <list[4]> 2020-08-08 04:49:20 1 x_domain timestamp batch_nr The parameters used by the learner (points after the transformation) are stored in in the x_domain column as lists. By using unnest = x_domain, the list elements are expanded to separate columns:  instance_random$archive$data(unnest = "x_domain")  k distance kernel scale classif.ce resample_result 1: 2.919058 1.779979 gaussian FALSE 0.299 <ResampleResult[18]> 2: 3.301324 2.554641 rectangular FALSE 0.294 <ResampleResult[18]> 3: 2.654531 2.921236 rectangular FALSE 0.315 <ResampleResult[18]> 4: 2.588931 1.869319 rank TRUE 0.254 <ResampleResult[18]> 5: 3.319396 2.425029 rectangular TRUE 0.268 <ResampleResult[18]> 6: 1.164253 1.799989 gaussian FALSE 0.364 <ResampleResult[18]> 7: 2.441256 1.650704 rank TRUE 0.246 <ResampleResult[18]> 8: 3.158912 2.514174 optimal FALSE 0.305 <ResampleResult[18]> 9: 3.047551 1.405385 gaussian TRUE 0.257 <ResampleResult[18]> 10: 2.442352 2.422242 gaussian TRUE 0.270 <ResampleResult[18]> 11: 3.521548 1.243384 rectangular TRUE 0.266 <ResampleResult[18]> 12: 2.331159 1.490977 optimal TRUE 0.252 <ResampleResult[18]> 13: 1.787328 1.286609 gaussian FALSE 0.345 <ResampleResult[18]> 14: 1.297461 1.479259 rank TRUE 0.272 <ResampleResult[18]> 15: 1.378451 1.117869 rectangular FALSE 0.355 <ResampleResult[18]> 16: 1.988414 2.284577 rectangular FALSE 0.313 <ResampleResult[18]> 17: 2.557743 2.752538 rank TRUE 0.267 <ResampleResult[18]> 18: 2.961104 2.557829 rank TRUE 0.265 <ResampleResult[18]> 19: 2.243193 2.594618 rectangular TRUE 0.256 <ResampleResult[18]> 20: 3.666907 1.910549 rectangular FALSE 0.292 <ResampleResult[18]> 21: 1.924639 1.820168 rank TRUE 0.255 <ResampleResult[18]> 22: 2.390153 2.621740 optimal FALSE 0.339 <ResampleResult[18]> 23: 2.033775 2.209867 rank FALSE 0.342 <ResampleResult[18]> 24: 2.929778 2.309448 rank FALSE 0.303 <ResampleResult[18]> 25: 1.824519 1.706395 rank TRUE 0.261 <ResampleResult[18]> 26: 2.444957 1.540520 optimal TRUE 0.249 <ResampleResult[18]> 27: 3.254559 2.985368 rank FALSE 0.302 <ResampleResult[18]> 28: 1.335633 2.266987 rank FALSE 0.362 <ResampleResult[18]> 29: 3.561251 1.426416 rank FALSE 0.297 <ResampleResult[18]> 30: 2.052564 1.258745 rectangular FALSE 0.320 <ResampleResult[18]> 31: 3.460303 1.956236 gaussian FALSE 0.296 <ResampleResult[18]> 32: 2.073975 2.848149 rank TRUE 0.269 <ResampleResult[18]> 33: 2.037658 2.197522 gaussian TRUE 0.267 <ResampleResult[18]> 34: 2.438784 2.952341 rectangular FALSE 0.308 <ResampleResult[18]> 35: 3.608733 2.463585 rank FALSE 0.294 <ResampleResult[18]> 36: 3.530354 1.713454 rectangular FALSE 0.294 <ResampleResult[18]> k distance kernel scale classif.ce resample_result timestamp batch_nr x_domain_k x_domain_distance x_domain_kernel 1: 2020-08-08 04:49:20 1 19 1.779979 gaussian 2: 2020-08-08 04:49:20 1 27 2.554641 rectangular 3: 2020-08-08 04:49:20 1 14 2.921236 rectangular 4: 2020-08-08 04:49:20 1 13 1.869319 rank 5: 2020-08-08 04:49:20 1 28 2.425029 rectangular 6: 2020-08-08 04:49:20 1 3 1.799989 gaussian 7: 2020-08-08 04:49:20 1 11 1.650704 rank 8: 2020-08-08 04:49:20 1 24 2.514174 optimal 9: 2020-08-08 04:49:20 1 21 1.405385 gaussian 10: 2020-08-08 04:49:20 1 12 2.422242 gaussian 11: 2020-08-08 04:49:20 1 34 1.243384 rectangular 12: 2020-08-08 04:49:20 1 10 1.490977 optimal 13: 2020-08-08 04:49:20 1 6 1.286609 gaussian 14: 2020-08-08 04:49:20 1 4 1.479259 rank 15: 2020-08-08 04:49:20 1 4 1.117869 rectangular 16: 2020-08-08 04:49:20 1 7 2.284577 rectangular 17: 2020-08-08 04:49:20 1 13 2.752538 rank 18: 2020-08-08 04:49:20 1 19 2.557829 rank 19: 2020-08-08 04:49:20 1 9 2.594618 rectangular 20: 2020-08-08 04:49:20 1 39 1.910549 rectangular 21: 2020-08-08 04:49:20 1 7 1.820168 rank 22: 2020-08-08 04:49:20 1 11 2.621740 optimal 23: 2020-08-08 04:49:20 1 8 2.209867 rank 24: 2020-08-08 04:49:20 1 19 2.309448 rank 25: 2020-08-08 04:49:20 1 6 1.706395 rank 26: 2020-08-08 04:49:20 1 12 1.540520 optimal 27: 2020-08-08 04:49:20 1 26 2.985368 rank 28: 2020-08-08 04:49:20 1 4 2.266987 rank 29: 2020-08-08 04:49:20 1 35 1.426416 rank 30: 2020-08-08 04:49:20 1 8 1.258745 rectangular 31: 2020-08-08 04:49:20 1 32 1.956236 gaussian 32: 2020-08-08 04:49:20 1 8 2.848149 rank 33: 2020-08-08 04:49:20 1 8 2.197522 gaussian 34: 2020-08-08 04:49:20 1 11 2.952341 rectangular 35: 2020-08-08 04:49:20 1 37 2.463585 rank 36: 2020-08-08 04:49:20 1 34 1.713454 rectangular timestamp batch_nr x_domain_k x_domain_distance x_domain_kernel x_domain_scale 1: FALSE 2: FALSE 3: FALSE 4: TRUE 5: TRUE 6: FALSE 7: TRUE 8: FALSE 9: TRUE 10: TRUE 11: TRUE 12: TRUE 13: FALSE 14: TRUE 15: FALSE 16: FALSE 17: TRUE 18: TRUE 19: TRUE 20: FALSE 21: TRUE 22: FALSE 23: FALSE 24: FALSE 25: TRUE 26: TRUE 27: FALSE 28: FALSE 29: FALSE 30: FALSE 31: FALSE 32: TRUE 33: TRUE 34: FALSE 35: FALSE 36: FALSE x_domain_scale Let’s now investigate the performance by parameters. This is especially easy using visualization:  ggplot(instance_random$archive$data(unnest = "x_domain"), aes(x = x_domain_k, y = classif.ce, color = x_domain_scale)) + geom_point(size = 3) + geom_line() The previous plot suggests that scale has a strong influence on performance. For the kernel, there does not seem to be a strong influence:  ggplot(instance_random$archive$data(unnest = "x_domain"), aes(x = x_domain_k, y = classif.ce, color = x_domain_kernel)) + geom_point(size = 3) + geom_line() ## Nested Resampling Having determined tuned configurations that seem to work well, we want to find out which performance we can expect from them. However, this may require more than this naive approach:  instance_random$result_y

classif.ce
0.246 

instance_grid$result_y  classif.ce 0.25  The problem associated with evaluating tuned models is overtuning. The more we search, the more optimistically biased the associated performance metrics from tuning become. There is a solution to this problem, namely Nested Resampling. The mlr3tuning package provides an AutoTuner that acts like our tuning method but is actually a Learner. The $train() method facilitates tuning of hyperparameters on the training data, using a resampling strategy (below we use 5-fold cross-validation). Then, we actually train a model with optimal hyperparameters on the whole training data.

The AutoTuner finds the best parameters and uses them for training:


grid_auto = AutoTuner$new( learner = knn, resampling = rsmp("cv", folds = 5), # we can NOT use fixed resampling here measure = msr("classif.ce"), search_space = search_space, terminator = trm("none"), tuner = tnr("grid_search", resolution = 18), ) The AutoTuner behaves just like a regular Learner. It can be used to combine the steps of hyperparameter tuning and model fitting but is especially useful for resampling and fair comparison of performance through benchmarking:  rr = resample(task, grid_auto, cv10_instance, store_models = TRUE) We aggregate the performances of all resampling iterations:  rr$aggregate()

classif.ce
0.265 

Essentially, this is the performance of a “knn with optimal hyperparameters found by grid search”. Note that grid_auto is not changed since resample() creates a clone for each resampling iteration. The trained AutoTuner objects can be accessed by using


rr$data$learner[[1]]

<AutoTuner:classif.kknn.tuned>
* Model: list
* Parameters: kernel=rectangular, k=13, distance=2
* Packages: kknn
* Predict Type: prob
* Feature types: logical, integer, numeric, factor, ordered
* Properties: multiclass, twoclass

rr$data$learner[[1]]\$tuning_result

k distance learner_param_vals  x_domain classif.ce
1: 13        2          <list[3]> <list[2]>  0.2522222

## Appendix

### Example: Tuning With A Larger Budget

It is always interesting to look at what could have been. The following dataset contains an optimization run result with 3600 evaluations – more than above by a factor of 100:


perfdata

k distance      kernel scale classif.ce
1:  9 2.232217    gaussian FALSE      0.320
2: 35 1.058476        rank FALSE      0.292
3: 17 2.121690     optimal  TRUE      0.257
4:  3 1.275450        rank FALSE      0.383
5: 16 2.126899     optimal FALSE      0.318
---
3596:  8 1.939409     optimal FALSE      0.350
3597: 14 1.604389        rank FALSE      0.307
3598:  5 2.054143        rank  TRUE      0.268
3599: 37 2.879286 rectangular  TRUE      0.275
3600: 37 2.807501     optimal  TRUE      0.253

The scale effect is just as visible as before with fewer data:


ggplot(perfdata, aes(x = k, y = classif.ce, color = scale)) +
geom_point(size = 2, alpha = 0.3)

Now, there seems to be a visible pattern by kernel as well:


ggplot(perfdata, aes(x = k, y = classif.ce, color = kernel)) +
geom_point(size = 2, alpha = 0.3)

In fact, if we zoom in to (5, 35) $$\times$$ (0.23, 0.28) and do decrease smoothing we see that different kernels have their optimum at different values of k:


ggplot(perfdata, aes(x = k, y = classif.ce, color = kernel,
group = interaction(kernel, scale))) +
geom_point(size = 2, alpha = 0.3) + geom_smooth() +
xlim(5, 35) + ylim(0.23, 0.28)

What about the distance parameter? If we select all results with k between 10 and 20 and plot distance and kernel we see an approximate relationship:


ggplot(perfdata[k > 10 & k < 20 & scale == TRUE],
aes(x = distance, y = classif.ce, color = kernel)) +
geom_point(size = 2) + geom_smooth()

In sum our observations are: The scale parameter is very influential, and scaling is beneficial. The distance type seems to be the least influential. Their seems to be an interaction between ‘k’ and ‘kernel’.

### Citation

Binder & Pfisterer (2020, March 11). mlr3gallery: mlr3tuning Tutorial - German Credit. Retrieved from https://mlr3gallery.mlr-org.com/posts/2020-03-11-mlr3tuning-tutorial-german-credit/
@misc{binder2020mlr3tuning,
}