mlr3tuning Tutorial - German Credit

mlr3tuning tuning german credit

In this use case, we continue working with the German credit dataset. We work on hyperparameter tuning and apply nested resampling.

Martin Binder , Florian Pfisterer
03-11-2020

Intro

This is the second part of a serial of tutorials. The other parts of this series can be found here:

We will continue working with the German credit dataset. In Part I, we peeked into the dataset by using and comparing some learners with their default parameters. We will now see how to:

Prerequisites

First, load the packages we are going to use:

We use the same Task as in Part I:

task = tsk("german_credit")

We also might want to use multiple cores to reduce long run times of tuning runs.

# future::plan("multiprocess") # uncomment for parallelization

Evaluation

We will evaluate all hyperparameter configurations using 10-fold CV. We use a fixed train-test split, i.e. the same splits for each evaluation. Otherwise, some evaluation could get unusually “hard” splits, which would make comparisons unfair.

set.seed(8008135)
cv10_instance = rsmp("cv", folds = 10)

# fix the train-test splits using the $instantiate() method
cv10_instance$instantiate(task)

# have a look at the test set instances per fold
cv10_instance$instance
      row_id fold
   1:      5    1
   2:     20    1
   3:     28    1
   4:     35    1
   5:     37    1
  ---            
 996:    936   10
 997:    950   10
 998:    963   10
 999:    985   10
1000:    994   10

Simple Parameter Tuning

Parameter tuning in mlr3 needs two packages:

  1. The paradox package is used for the search space definition of the hyperparameters
  2. The mlr3tuning package is used for tuning the hyperparameters

Search Space and Problem Definition

First, we need to decide what Learner we want to optimize. We will use LearnerClassifKKNN, the “kernelized” k-nearest neighbor classifier. We will use kknn as a normal kNN without weighting first (i.e., using the rectangular kernel):

knn = lrn("classif.kknn", predict_type = "prob")
knn$param_set$values$kernel = "rectangular"

As a next step, we decide what parameters we optimize over. Before that, though, we are interested in the parameter set on which we could tune:

knn$param_set
<ParamSet>
         id    class lower upper
1:        k ParamInt     1   Inf
2: distance ParamDbl     0   Inf
3:   kernel ParamFct    NA    NA
4:    scale ParamLgl    NA    NA
5:  ykernel ParamUty    NA    NA
                                                           levels default
1:                                                                      7
2:                                                                      2
3: rectangular,triangular,epanechnikov,biweight,triweight,cos,... optimal
4:                                                     TRUE,FALSE    TRUE
5:                                                                       
         value
1:            
2:            
3: rectangular
4:            
5:            

We first tune the k parameter (i.e. the number of nearest neighbors), between 3 to 20. Second, we tune the distance function, allowing L1 and L2 distances. To do so, we use the paradox package to define a search space (see the online vignette for a more complete introduction.

search_space = ParamSet$new(list(
  ParamInt$new("k", lower = 3, upper = 20),
  ParamInt$new("distance", lower = 1, upper = 2)
))

As a next step, we define a TuningInstanceSingleCrit that represents the problem we are trying to optimize.

instance_grid = TuningInstanceSingleCrit$new(
  task = task,
  learner = knn,
  resampling = cv10_instance,
  measure = msr("classif.ce"),
  search_space = search_space,
  terminator = trm("none")
)

After having set up a tuning instance, we can start tuning. Before that, we need a tuning strategy, though. A simple tuning method is to try all possible combinations of parameters: Grid Search. While it is very intuitive and simple, it is inefficient if the search space is large. For this simple use case, it suffices, though. We get the grid_search tuner via:

set.seed(1)
tuner_grid = tnr("grid_search", resolution = 18, batch_size = 36)

Tuning works by calling $optimize(). Note that the tuning procedure modifies our tuning instance (as usual for R6 class objects). The result can be found in the instance object. Before tuning it is empty:

instance_grid$result
NULL

Now, we tune:

tuner_grid$optimize(instance_grid)
   k distance learner_param_vals  x_domain classif.ce
1: 9        2          <list[3]> <list[2]>       0.25

The result is returned by $optimize() together with its performance. It can be also accessed with the $result slot:

instance_grid$result
   k distance learner_param_vals  x_domain classif.ce
1: 9        2          <list[3]> <list[2]>       0.25

We can also look at the Archive of evaluated configurations:

instance_grid$archive$data()
     k distance classif.ce                                uhash  x_domain
 1:  3        1      0.271 f74f48dd-7b61-49a7-a052-28139a96179e <list[2]>
 2:  3        2      0.273 9c98d512-d94d-4f90-813a-70be747840c2 <list[2]>
 3:  4        1      0.292 0f518b79-3095-40c5-9860-e9e401ebd709 <list[2]>
 4:  4        2      0.279 da5a02c4-1d19-4f0f-a0d9-220f3b2b0b2e <list[2]>
 5:  5        1      0.271 6a20d3a7-26d0-47cf-98e7-9a37b45e4776 <list[2]>
 6:  5        2      0.274 5e54ae73-d9d9-43c5-b44e-b41099f70e91 <list[2]>
 7:  6        1      0.278 aaf7cd1b-8ece-4eff-8611-8ce9c5fc4e9a <list[2]>
 8:  6        2      0.273 977df4ea-9cde-42f6-9741-6c5da98d793c <list[2]>
 9:  7        1      0.257 c3c90cb6-fbc6-457c-afec-625b9e7ca986 <list[2]>
10:  7        2      0.258 456f2040-0377-4f34-ac3a-08e9e3efddac <list[2]>
11:  8        1      0.264 638ef8ea-e3de-438d-869d-d379491a24ef <list[2]>
12:  8        2      0.256 ae4bf5fa-2767-40da-a563-6bf465a4c930 <list[2]>
13:  9        1      0.251 2129934a-d475-4187-93ff-4bf3b684d75e <list[2]>
14:  9        2      0.250 4d9f73ef-294f-429e-ad12-f62d18f332a4 <list[2]>
15: 10        1      0.261 a8ade178-c7c6-4b3e-ab50-2aea8cbe94f2 <list[2]>
16: 10        2      0.250 54b465f1-b558-4337-ab73-9c896692c8ff <list[2]>
17: 11        1      0.256 7402dfde-780b-4ec6-ae82-ad6d59c9b2d5 <list[2]>
18: 11        2      0.254 f96394a8-19b1-495f-99ef-4e94b0ece8b7 <list[2]>
19: 12        1      0.260 70de0df5-1465-4d8f-8c2f-507d47cfa08e <list[2]>
20: 12        2      0.259 97f9c537-cdfd-4748-80f0-1372c1150d26 <list[2]>
21: 13        1      0.268 fa64f09e-aec1-4927-8687-34288f061707 <list[2]>
22: 13        2      0.258 06cf5877-fcf1-4400-a11d-29cefb311524 <list[2]>
23: 14        1      0.265 4a794375-7d36-43de-8b6e-b088c656bab3 <list[2]>
24: 14        2      0.263 2ada62fd-2d22-4ca3-89a0-5b9c00c9020c <list[2]>
25: 15        1      0.268 2d708863-1098-4c9d-96de-a66b3be9753b <list[2]>
26: 15        2      0.264 ffc5b37b-43f8-42aa-8667-76aec1aab4cd <list[2]>
27: 16        1      0.267 3be97078-0161-460f-8212-5ac48955c779 <list[2]>
28: 16        2      0.262 1905dbdc-4963-4c18-8441-50b12957fdb2 <list[2]>
29: 17        1      0.264 0c5523a2-e6c4-49a3-b57e-480287ef2621 <list[2]>
30: 17        2      0.267 928810bf-001b-4d42-8c8b-a5a99e682d52 <list[2]>
31: 18        1      0.273 63591fce-afee-48ea-80da-939a379c5f09 <list[2]>
32: 18        2      0.271 72c0ef45-88f9-48b4-bfa1-42448d72d4c3 <list[2]>
33: 19        1      0.269 2718d573-04a6-4c5a-8cad-ecfab89e48ee <list[2]>
34: 19        2      0.269 5977f58a-e947-4a57-9a60-fb01feeda7c9 <list[2]>
35: 20        1      0.268 02bb5bad-dfc2-409c-b3bf-93e825234c71 <list[2]>
36: 20        2      0.269 394ecce7-d376-47ec-a87a-f72e3ff31519 <list[2]>
     k distance classif.ce                                uhash  x_domain
              timestamp batch_nr
 1: 2020-10-28 04:52:34        1
 2: 2020-10-28 04:52:34        1
 3: 2020-10-28 04:52:34        1
 4: 2020-10-28 04:52:34        1
 5: 2020-10-28 04:52:34        1
 6: 2020-10-28 04:52:34        1
 7: 2020-10-28 04:52:34        1
 8: 2020-10-28 04:52:34        1
 9: 2020-10-28 04:52:34        1
10: 2020-10-28 04:52:34        1
11: 2020-10-28 04:52:34        1
12: 2020-10-28 04:52:34        1
13: 2020-10-28 04:52:34        1
14: 2020-10-28 04:52:34        1
15: 2020-10-28 04:52:34        1
16: 2020-10-28 04:52:34        1
17: 2020-10-28 04:52:34        1
18: 2020-10-28 04:52:34        1
19: 2020-10-28 04:52:34        1
20: 2020-10-28 04:52:34        1
21: 2020-10-28 04:52:34        1
22: 2020-10-28 04:52:34        1
23: 2020-10-28 04:52:34        1
24: 2020-10-28 04:52:34        1
25: 2020-10-28 04:52:34        1
26: 2020-10-28 04:52:34        1
27: 2020-10-28 04:52:34        1
28: 2020-10-28 04:52:34        1
29: 2020-10-28 04:52:34        1
30: 2020-10-28 04:52:34        1
31: 2020-10-28 04:52:34        1
32: 2020-10-28 04:52:34        1
33: 2020-10-28 04:52:34        1
34: 2020-10-28 04:52:34        1
35: 2020-10-28 04:52:34        1
36: 2020-10-28 04:52:34        1
              timestamp batch_nr

We plot the performances depending on the sampled k and distance:

ggplot(instance_grid$archive$data(), aes(x = k, y = classif.ce, color = as.factor(distance))) +
  geom_line() + geom_point(size = 3)

On average, the Euclidean distance (distance = 2) seems to work better. However, there is much randomness introduced by the resampling instance. So you, the reader, may see a different result, when you run the experiment yourself and set a different random seed. For k, we find that values between 7 and 13 perform well.

Random Search and Transformation

Let’s have a look at a larger search space. For example, we could tune all available parameters and limit k to large values (50). We also now tune the distance param continuously from 1 to 3 as a double and tune distance kernel and whether we scale the features.

We may find two problems when doing so:

First, the resulting difference in performance between k = 3 and k = 4 is probably larger than the difference between k = 49 and k = 50. While 4 is 33% larger than 3, 50 is only 2 percent larger than 49. To account for this we will use a transformation function for k and optimize in log-space. We define the range for k from log(3) to log(50) and exponentiate in the transformation. Now, as k has become a double instead of an int (in the search space, before transformation), we aösp round it in the trafo.

large_searchspace = ParamSet$new(list(
  ParamDbl$new("k", lower = log(3), upper = log(50)),
  ParamDbl$new("distance", lower = 1, upper = 3),
  ParamFct$new("kernel", c("rectangular", "gaussian", "rank", "optimal")),
  ParamLgl$new("scale")
))

large_searchspace$trafo = function(x, param_set) {
  x$k = round(exp(x$k))
  x
}

The second problem is that grid search may (and often will) take a long time. For instance, trying out three different values for k, distance, kernel, and the two values for scale will take 54 evaluations. Because of this, we use a different search algorithm, namely the Random Search. We need to specify in the tuning instance a termination criterion. The criterion tells the search algorithm when to stop. Here, we will terminate after 36 evaluations:

tuner_random = tnr("random_search", batch_size = 36)

instance_random = TuningInstanceSingleCrit$new(
  task = task,
  learner = knn,
  resampling = cv10_instance,
  measure = msr("classif.ce"),
  search_space = large_searchspace,
  terminator = trm("evals", n_evals = 36)
)
tuner_random$optimize(instance_random)
          k distance  kernel scale learner_param_vals  x_domain classif.ce
1: 2.444957  1.54052 optimal  TRUE          <list[4]> <list[4]>      0.249

Like before, we can review the Archive. It includes the points before and after the transformation. The archive includes a column for each parameter the Tuner sampled on the search space (points before the transformation):

instance_random$archive$data()
           k distance      kernel scale classif.ce
 1: 2.654531 2.921236 rectangular FALSE      0.314
 2: 2.588931 1.869319        rank  TRUE      0.254
 3: 3.319396 2.425029 rectangular  TRUE      0.272
 4: 1.164253 1.799989    gaussian FALSE      0.364
 5: 2.441256 1.650704        rank  TRUE      0.253
 6: 3.158912 2.514174     optimal FALSE      0.305
 7: 3.047551 1.405385    gaussian  TRUE      0.257
 8: 2.442352 2.422242    gaussian  TRUE      0.270
 9: 3.521548 1.243384 rectangular  TRUE      0.271
10: 2.331159 1.490977     optimal  TRUE      0.252
11: 1.787328 1.286609    gaussian FALSE      0.345
12: 1.297461 1.479259        rank  TRUE      0.274
13: 1.378451 1.117869 rectangular FALSE      0.357
14: 1.988414 2.284577 rectangular FALSE      0.313
15: 2.557743 2.752538        rank  TRUE      0.267
16: 2.961104 2.557829        rank  TRUE      0.264
17: 2.243193 2.594618 rectangular  TRUE      0.256
18: 3.666907 1.910549 rectangular FALSE      0.292
19: 1.924639 1.820168        rank  TRUE      0.256
20: 2.390153 2.621740     optimal FALSE      0.339
21: 2.033775 2.209867        rank FALSE      0.332
22: 2.929778 2.309448        rank FALSE      0.302
23: 1.824519 1.706395        rank  TRUE      0.261
24: 2.444957 1.540520     optimal  TRUE      0.249
25: 3.254559 2.985368        rank FALSE      0.302
26: 1.335633 2.266987        rank FALSE      0.361
27: 3.561251 1.426416        rank FALSE      0.296
28: 2.052564 1.258745 rectangular FALSE      0.324
29: 3.460303 1.956236    gaussian FALSE      0.296
30: 2.073975 2.848149        rank  TRUE      0.269
31: 2.037658 2.197522    gaussian  TRUE      0.267
32: 2.438784 2.952341 rectangular FALSE      0.308
33: 3.608733 2.463585        rank FALSE      0.294
34: 3.530354 1.713454 rectangular FALSE      0.297
35: 2.195813 1.862947     optimal  TRUE      0.257
36: 3.285535 1.296423        rank FALSE      0.300
           k distance      kernel scale classif.ce
                                   uhash  x_domain           timestamp batch_nr
 1: 5efda24e-5e7f-495d-95b9-6ec043158e98 <list[4]> 2020-10-28 04:53:18        1
 2: e95dda9a-0b0b-4430-9574-30dc82df4344 <list[4]> 2020-10-28 04:53:18        1
 3: dfda357a-1761-4458-9eff-bfd18e232d09 <list[4]> 2020-10-28 04:53:18        1
 4: 456d1e3e-b999-4b92-b442-62118818c0ce <list[4]> 2020-10-28 04:53:18        1
 5: ff970ca1-9b45-4704-bb67-681bb04f7701 <list[4]> 2020-10-28 04:53:18        1
 6: eb3b3393-4eea-4109-8203-8af3e33d07b9 <list[4]> 2020-10-28 04:53:18        1
 7: 40c4d349-5949-4a1b-8127-29a4af42cb2f <list[4]> 2020-10-28 04:53:18        1
 8: 4b6a25e6-f8d7-4bc5-b661-843c460277b0 <list[4]> 2020-10-28 04:53:18        1
 9: 0c24a1cb-6c90-4eef-aa68-ac27b0397854 <list[4]> 2020-10-28 04:53:18        1
10: 0dfecf49-796f-4a71-8061-cf58bcca487b <list[4]> 2020-10-28 04:53:18        1
11: a5e5e643-0be7-45e8-8b73-34fbc3293e58 <list[4]> 2020-10-28 04:53:18        1
12: e9d6eeb9-740e-49ca-85f8-4526332a7e2e <list[4]> 2020-10-28 04:53:18        1
13: 50505f1c-81f2-4fce-9775-6c054adfb348 <list[4]> 2020-10-28 04:53:18        1
14: 2c064421-f6ae-4b24-a4aa-dc9535443e11 <list[4]> 2020-10-28 04:53:18        1
15: a58b4722-7a21-458a-8a5b-c6b073fd26bd <list[4]> 2020-10-28 04:53:18        1
16: 1a9fcf7a-98fa-4006-8293-f0bf9bb84241 <list[4]> 2020-10-28 04:53:18        1
17: df678700-b51b-461f-b7a4-b02c6ba43aa9 <list[4]> 2020-10-28 04:53:18        1
18: 46d86fba-a26c-4aa2-9ca7-29bbd67cceb0 <list[4]> 2020-10-28 04:53:18        1
19: d576f9d2-f360-4bbd-af43-4b2e88bbc2d4 <list[4]> 2020-10-28 04:53:18        1
20: c9117d2b-6f02-4f8c-b333-82863a6ec8cb <list[4]> 2020-10-28 04:53:18        1
21: cf03e0c1-019a-4cda-a2fd-c9ad4f9d9f4d <list[4]> 2020-10-28 04:53:18        1
22: 375d3229-63dc-4404-82c9-7947e9d2fdb0 <list[4]> 2020-10-28 04:53:18        1
23: aa101c08-8d5a-40d2-989a-d864868d1b1a <list[4]> 2020-10-28 04:53:18        1
24: 61b2eac7-672b-4121-94d0-763325e769ae <list[4]> 2020-10-28 04:53:18        1
25: 524850a0-49b7-4f50-9ac8-90e2d24d2e6e <list[4]> 2020-10-28 04:53:18        1
26: e6a707be-46ee-4f54-838c-29f25a90320c <list[4]> 2020-10-28 04:53:18        1
27: f6df5ab6-e357-45d5-b3f7-4b0e778f7283 <list[4]> 2020-10-28 04:53:18        1
28: c0539182-39cb-429c-9531-edcd15b6f445 <list[4]> 2020-10-28 04:53:18        1
29: e0692f88-02e2-43dc-9e74-65f764810cf8 <list[4]> 2020-10-28 04:53:18        1
30: edaac38a-050d-4a8d-a26d-4ed4d0d0e5fc <list[4]> 2020-10-28 04:53:18        1
31: e1fa1ec0-42d5-4d38-a809-1fe6f6e7d315 <list[4]> 2020-10-28 04:53:18        1
32: 660c5558-d5d6-4317-a93b-e3fb8655077c <list[4]> 2020-10-28 04:53:18        1
33: 540db5b6-4017-4ac0-90a0-3641fb6da57a <list[4]> 2020-10-28 04:53:18        1
34: 951cb938-8421-497e-87f9-a2fd9bb5d794 <list[4]> 2020-10-28 04:53:18        1
35: b0220de2-4614-4690-9fb6-37ea89bc03d6 <list[4]> 2020-10-28 04:53:18        1
36: 7ae6dd5f-f74f-406f-b207-848acf42c68e <list[4]> 2020-10-28 04:53:18        1
                                   uhash  x_domain           timestamp batch_nr

The parameters used by the learner (points after the transformation) are stored in in the x_domain column as lists. By using unnest = x_domain, the list elements are expanded to separate columns:

instance_random$archive$data(unnest = "x_domain")
           k distance      kernel scale classif.ce
 1: 2.654531 2.921236 rectangular FALSE      0.314
 2: 2.588931 1.869319        rank  TRUE      0.254
 3: 3.319396 2.425029 rectangular  TRUE      0.272
 4: 1.164253 1.799989    gaussian FALSE      0.364
 5: 2.441256 1.650704        rank  TRUE      0.253
 6: 3.158912 2.514174     optimal FALSE      0.305
 7: 3.047551 1.405385    gaussian  TRUE      0.257
 8: 2.442352 2.422242    gaussian  TRUE      0.270
 9: 3.521548 1.243384 rectangular  TRUE      0.271
10: 2.331159 1.490977     optimal  TRUE      0.252
11: 1.787328 1.286609    gaussian FALSE      0.345
12: 1.297461 1.479259        rank  TRUE      0.274
13: 1.378451 1.117869 rectangular FALSE      0.357
14: 1.988414 2.284577 rectangular FALSE      0.313
15: 2.557743 2.752538        rank  TRUE      0.267
16: 2.961104 2.557829        rank  TRUE      0.264
17: 2.243193 2.594618 rectangular  TRUE      0.256
18: 3.666907 1.910549 rectangular FALSE      0.292
19: 1.924639 1.820168        rank  TRUE      0.256
20: 2.390153 2.621740     optimal FALSE      0.339
21: 2.033775 2.209867        rank FALSE      0.332
22: 2.929778 2.309448        rank FALSE      0.302
23: 1.824519 1.706395        rank  TRUE      0.261
24: 2.444957 1.540520     optimal  TRUE      0.249
25: 3.254559 2.985368        rank FALSE      0.302
26: 1.335633 2.266987        rank FALSE      0.361
27: 3.561251 1.426416        rank FALSE      0.296
28: 2.052564 1.258745 rectangular FALSE      0.324
29: 3.460303 1.956236    gaussian FALSE      0.296
30: 2.073975 2.848149        rank  TRUE      0.269
31: 2.037658 2.197522    gaussian  TRUE      0.267
32: 2.438784 2.952341 rectangular FALSE      0.308
33: 3.608733 2.463585        rank FALSE      0.294
34: 3.530354 1.713454 rectangular FALSE      0.297
35: 2.195813 1.862947     optimal  TRUE      0.257
36: 3.285535 1.296423        rank FALSE      0.300
           k distance      kernel scale classif.ce
                                   uhash           timestamp batch_nr
 1: 5efda24e-5e7f-495d-95b9-6ec043158e98 2020-10-28 04:53:18        1
 2: e95dda9a-0b0b-4430-9574-30dc82df4344 2020-10-28 04:53:18        1
 3: dfda357a-1761-4458-9eff-bfd18e232d09 2020-10-28 04:53:18        1
 4: 456d1e3e-b999-4b92-b442-62118818c0ce 2020-10-28 04:53:18        1
 5: ff970ca1-9b45-4704-bb67-681bb04f7701 2020-10-28 04:53:18        1
 6: eb3b3393-4eea-4109-8203-8af3e33d07b9 2020-10-28 04:53:18        1
 7: 40c4d349-5949-4a1b-8127-29a4af42cb2f 2020-10-28 04:53:18        1
 8: 4b6a25e6-f8d7-4bc5-b661-843c460277b0 2020-10-28 04:53:18        1
 9: 0c24a1cb-6c90-4eef-aa68-ac27b0397854 2020-10-28 04:53:18        1
10: 0dfecf49-796f-4a71-8061-cf58bcca487b 2020-10-28 04:53:18        1
11: a5e5e643-0be7-45e8-8b73-34fbc3293e58 2020-10-28 04:53:18        1
12: e9d6eeb9-740e-49ca-85f8-4526332a7e2e 2020-10-28 04:53:18        1
13: 50505f1c-81f2-4fce-9775-6c054adfb348 2020-10-28 04:53:18        1
14: 2c064421-f6ae-4b24-a4aa-dc9535443e11 2020-10-28 04:53:18        1
15: a58b4722-7a21-458a-8a5b-c6b073fd26bd 2020-10-28 04:53:18        1
16: 1a9fcf7a-98fa-4006-8293-f0bf9bb84241 2020-10-28 04:53:18        1
17: df678700-b51b-461f-b7a4-b02c6ba43aa9 2020-10-28 04:53:18        1
18: 46d86fba-a26c-4aa2-9ca7-29bbd67cceb0 2020-10-28 04:53:18        1
19: d576f9d2-f360-4bbd-af43-4b2e88bbc2d4 2020-10-28 04:53:18        1
20: c9117d2b-6f02-4f8c-b333-82863a6ec8cb 2020-10-28 04:53:18        1
21: cf03e0c1-019a-4cda-a2fd-c9ad4f9d9f4d 2020-10-28 04:53:18        1
22: 375d3229-63dc-4404-82c9-7947e9d2fdb0 2020-10-28 04:53:18        1
23: aa101c08-8d5a-40d2-989a-d864868d1b1a 2020-10-28 04:53:18        1
24: 61b2eac7-672b-4121-94d0-763325e769ae 2020-10-28 04:53:18        1
25: 524850a0-49b7-4f50-9ac8-90e2d24d2e6e 2020-10-28 04:53:18        1
26: e6a707be-46ee-4f54-838c-29f25a90320c 2020-10-28 04:53:18        1
27: f6df5ab6-e357-45d5-b3f7-4b0e778f7283 2020-10-28 04:53:18        1
28: c0539182-39cb-429c-9531-edcd15b6f445 2020-10-28 04:53:18        1
29: e0692f88-02e2-43dc-9e74-65f764810cf8 2020-10-28 04:53:18        1
30: edaac38a-050d-4a8d-a26d-4ed4d0d0e5fc 2020-10-28 04:53:18        1
31: e1fa1ec0-42d5-4d38-a809-1fe6f6e7d315 2020-10-28 04:53:18        1
32: 660c5558-d5d6-4317-a93b-e3fb8655077c 2020-10-28 04:53:18        1
33: 540db5b6-4017-4ac0-90a0-3641fb6da57a 2020-10-28 04:53:18        1
34: 951cb938-8421-497e-87f9-a2fd9bb5d794 2020-10-28 04:53:18        1
35: b0220de2-4614-4690-9fb6-37ea89bc03d6 2020-10-28 04:53:18        1
36: 7ae6dd5f-f74f-406f-b207-848acf42c68e 2020-10-28 04:53:18        1
                                   uhash           timestamp batch_nr
    x_domain_k x_domain_distance x_domain_kernel x_domain_scale
 1:         14          2.921236     rectangular          FALSE
 2:         13          1.869319            rank           TRUE
 3:         28          2.425029     rectangular           TRUE
 4:          3          1.799989        gaussian          FALSE
 5:         11          1.650704            rank           TRUE
 6:         24          2.514174         optimal          FALSE
 7:         21          1.405385        gaussian           TRUE
 8:         12          2.422242        gaussian           TRUE
 9:         34          1.243384     rectangular           TRUE
10:         10          1.490977         optimal           TRUE
11:          6          1.286609        gaussian          FALSE
12:          4          1.479259            rank           TRUE
13:          4          1.117869     rectangular          FALSE
14:          7          2.284577     rectangular          FALSE
15:         13          2.752538            rank           TRUE
16:         19          2.557829            rank           TRUE
17:          9          2.594618     rectangular           TRUE
18:         39          1.910549     rectangular          FALSE
19:          7          1.820168            rank           TRUE
20:         11          2.621740         optimal          FALSE
21:          8          2.209867            rank          FALSE
22:         19          2.309448            rank          FALSE
23:          6          1.706395            rank           TRUE
24:         12          1.540520         optimal           TRUE
25:         26          2.985368            rank          FALSE
26:          4          2.266987            rank          FALSE
27:         35          1.426416            rank          FALSE
28:          8          1.258745     rectangular          FALSE
29:         32          1.956236        gaussian          FALSE
30:          8          2.848149            rank           TRUE
31:          8          2.197522        gaussian           TRUE
32:         11          2.952341     rectangular          FALSE
33:         37          2.463585            rank          FALSE
34:         34          1.713454     rectangular          FALSE
35:          9          1.862947         optimal           TRUE
36:         27          1.296423            rank          FALSE
    x_domain_k x_domain_distance x_domain_kernel x_domain_scale

Let’s now investigate the performance by parameters. This is especially easy using visualization:

ggplot(instance_random$archive$data(unnest = "x_domain"),
  aes(x = x_domain_k, y = classif.ce, color = x_domain_scale)) +
  geom_point(size = 3) + geom_line()

The previous plot suggests that scale has a strong influence on performance. For the kernel, there does not seem to be a strong influence:

ggplot(instance_random$archive$data(unnest = "x_domain"),
  aes(x = x_domain_k, y = classif.ce, color = x_domain_kernel)) +
  geom_point(size = 3) + geom_line()

Nested Resampling

Having determined tuned configurations that seem to work well, we want to find out which performance we can expect from them. However, this may require more than this naive approach:

instance_random$result_y
classif.ce 
     0.249 
instance_grid$result_y
classif.ce 
      0.25 

The problem associated with evaluating tuned models is overtuning. The more we search, the more optimistically biased the associated performance metrics from tuning become.

There is a solution to this problem, namely Nested Resampling.

The mlr3tuning package provides an AutoTuner that acts like our tuning method but is actually a Learner. The $train() method facilitates tuning of hyperparameters on the training data, using a resampling strategy (below we use 5-fold cross-validation). Then, we actually train a model with optimal hyperparameters on the whole training data.

The AutoTuner finds the best parameters and uses them for training:

grid_auto = AutoTuner$new(
  learner = knn,
  resampling = rsmp("cv", folds = 5), # we can NOT use fixed resampling here
  measure = msr("classif.ce"),
  search_space = search_space,
  terminator = trm("none"),
  tuner = tnr("grid_search", resolution = 18),
)

The AutoTuner behaves just like a regular Learner. It can be used to combine the steps of hyperparameter tuning and model fitting but is especially useful for resampling and fair comparison of performance through benchmarking:

rr = resample(task, grid_auto, cv10_instance, store_models = TRUE)

We aggregate the performances of all resampling iterations:

rr$aggregate()
classif.ce 
     0.256 

Essentially, this is the performance of a “knn with optimal hyperparameters found by grid search”. Note that grid_auto is not changed since resample() creates a clone for each resampling iteration. The trained AutoTuner objects can be accessed by using

rr$learners[[1]]
<AutoTuner:classif.kknn.tuned>
* Model: list
* Parameters: kernel=rectangular, k=9, distance=2
* Packages: kknn
* Predict Type: prob
* Feature types: logical, integer, numeric, factor, ordered
* Properties: multiclass, twoclass
rr$learners[[1]]$tuning_result
   k distance learner_param_vals  x_domain classif.ce
1: 9        2          <list[3]> <list[2]>       0.26

Appendix

Example: Tuning With A Larger Budget

It is always interesting to look at what could have been. The following dataset contains an optimization run result with 3600 evaluations – more than above by a factor of 100:

perfdata
             k distance   kernel scale classif.ce
   1: 2.191216 2.232217 gaussian FALSE      0.312
   2: 3.549142 1.058476     rank FALSE      0.296
   3: 2.835727 2.121690  optimal  TRUE      0.251
   4: 1.118085 1.275450     rank FALSE      0.368
   5: 2.790168 2.126899  optimal FALSE      0.320
  ---                                            
3596: 3.023075 1.413180  optimal FALSE      0.306
3597: 3.243131 1.827885 gaussian  TRUE      0.255
3598: 1.628957 2.254808     rank  TRUE      0.271
3599: 3.298112 2.984946  optimal FALSE      0.301
3600: 3.855455 2.613641 gaussian FALSE      0.294
                                     uhash           timestamp batch_nr
   1: ca82829b-8915-4b0c-831b-15208edad9ae 2020-10-08 10:52:41        1
   2: e075729f-bef1-4f06-88c7-8d4e379866be 2020-10-08 10:52:41        1
   3: 215b080b-9e0a-40f5-8305-e4c4c85fb99c 2020-10-08 10:52:41        1
   4: 1e640077-c625-49fd-b0dc-58ebcb49d1d9 2020-10-08 10:52:41        1
   5: 495af8d5-474c-4451-9f02-06d1d26f4ebd 2020-10-08 10:52:41        1
  ---                                                                  
3596: e9d1d423-80b2-4955-96fd-5b27dae39617 2020-10-08 11:43:55      100
3597: 66f68400-4622-4ba0-a2f3-facaf141d97c 2020-10-08 11:43:55      100
3598: 61d53a77-507d-4449-8e99-ce1273ab2c73 2020-10-08 11:43:55      100
3599: ba0392f8-719b-438d-bbd3-a982e736c2c4 2020-10-08 11:43:55      100
3600: b14811dc-5519-45ca-ac75-3ebe9e81f9a0 2020-10-08 11:43:55      100
      x_domain_k x_domain_distance x_domain_kernel x_domain_scale
   1:          9          2.232217        gaussian          FALSE
   2:         35          1.058476            rank          FALSE
   3:         17          2.121690         optimal           TRUE
   4:          3          1.275450            rank          FALSE
   5:         16          2.126899         optimal          FALSE
  ---                                                            
3596:         21          1.413180         optimal          FALSE
3597:         26          1.827885        gaussian           TRUE
3598:          5          2.254808            rank           TRUE
3599:         27          2.984946         optimal          FALSE
3600:         47          2.613641        gaussian          FALSE

The scale effect is just as visible as before with fewer data:

ggplot(perfdata, aes(x = x_domain_k, y = classif.ce, color = scale)) +
  geom_point(size = 2, alpha = 0.3)

Now, there seems to be a visible pattern by kernel as well:

ggplot(perfdata, aes(x = x_domain_k, y = classif.ce, color = kernel)) +
  geom_point(size = 2, alpha = 0.3)

In fact, if we zoom in to (5, 35) \(\times\) (0.23, 0.28) and do decrease smoothing we see that different kernels have their optimum at different values of k:

ggplot(perfdata, aes(x = x_domain_k, y = classif.ce, color = kernel,
  group = interaction(kernel, scale))) +
  geom_point(size = 2, alpha = 0.3) + geom_smooth() +
  xlim(5, 35) + ylim(0.23, 0.28)

What about the distance parameter? If we select all results with k between 10 and 20 and plot distance and kernel we see an approximate relationship:

ggplot(perfdata[x_domain_k > 10 & x_domain_k < 20 & scale == TRUE],
  aes(x = distance, y = classif.ce, color = kernel)) +
  geom_point(size = 2) + geom_smooth()

In sum our observations are: The scale parameter is very influential, and scaling is beneficial. The distance type seems to be the least influential. Their seems to be an interaction between ‘k’ and ‘kernel’.

Citation

For attribution, please cite this work as

Binder & Pfisterer (2020, March 11). mlr3gallery: mlr3tuning Tutorial - German Credit. Retrieved from https://mlr3gallery.mlr-org.com/posts/2020-03-11-mlr3tuning-tutorial-german-credit/

BibTeX citation

@misc{binder2020mlr3tuning,
  author = {Binder, Martin and Pfisterer, Florian},
  title = {mlr3gallery: mlr3tuning Tutorial - German Credit},
  url = {https://mlr3gallery.mlr-org.com/posts/2020-03-11-mlr3tuning-tutorial-german-credit/},
  year = {2020}
}