Introduction to mlr3tuningspaces

mlr3tuning tuning optimization pima data set classification

We teach how to use the package mlr3tuningspaces.

Marc Becker
07-06-2021

Scope

The package mlr3tuningspaces offers a selection of published search spaces for many popular machine learning algorithms. In this post, we show how to tune a mlr3 learners with these search spaces.

Prerequisites

The packages mlr3verse and mlr3tuningspaces are required for this demonstration:

library(mlr3verse)
library(mlr3tuningspaces)

We initialize the random number generator with a fixed seed for reproducibility, and decrease the verbosity of the logger to keep the output clearly represented.

set.seed(7832)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")

In the example, we use the pima indian diabetes data set which is used to predict whether or not a patient has diabetes. The patients are characterized by 8 numeric features, some of them have missing values.

# retrieve the task from mlr3
task = tsk("pima")

# generate a quick textual overview using the skimr package
skimr::skim(task$data())
Table 1: Data summary
Name task$data()
Number of rows 768
Number of columns 9
Key NULL
_______________________
Column type frequency:
factor 1
numeric 8
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
diabetes 0 1 FALSE 2 neg: 500, pos: 268

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
age 0 1.00 33.24 11.76 21.00 24.00 29.00 41.00 81.00 ▇▃▁▁▁
glucose 5 0.99 121.69 30.54 44.00 99.00 117.00 141.00 199.00 ▁▇▇▃▂
insulin 374 0.51 155.55 118.78 14.00 76.25 125.00 190.00 846.00 ▇▂▁▁▁
mass 11 0.99 32.46 6.92 18.20 27.50 32.30 36.60 67.10 ▅▇▃▁▁
pedigree 0 1.00 0.47 0.33 0.08 0.24 0.37 0.63 2.42 ▇▃▁▁▁
pregnant 0 1.00 3.85 3.37 0.00 1.00 3.00 6.00 17.00 ▇▃▂▁▁
pressure 35 0.95 72.41 12.38 24.00 64.00 72.00 80.00 122.00 ▁▃▇▂▁
triceps 227 0.70 29.15 10.48 7.00 22.00 29.00 36.00 99.00 ▆▇▁▁▁

Tuning Search Space

For tuning, it is important to create a search space that defines the type and range of the hyperparameters. A learner stores all information about its hyperparameters in the slot $param_set. Usually, we have to chose a subset of hyperparameters we want to tune.

lrn("classif.rpart")$param_set
<ParamSet>
                id    class lower upper nlevels        default value
 1:             cp ParamDbl     0     1     Inf           0.01      
 2:     keep_model ParamLgl    NA    NA       2          FALSE      
 3:     maxcompete ParamInt     0   Inf     Inf              4      
 4:       maxdepth ParamInt     1    30      30             30      
 5:   maxsurrogate ParamInt     0   Inf     Inf              5      
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[3]>      
 7:       minsplit ParamInt     1   Inf     Inf             20      
 8: surrogatestyle ParamInt     0     1       2              0      
 9:   usesurrogate ParamInt     0     2       3              2      
10:           xval ParamInt     0   Inf     Inf             10     0

Package

At the heart of mlr3tuningspaces is the R6 class TuningSpace. It stores a list of TuneToken, helper functions and additional meta information. The list of TuneToken can be directly applied to the $values slot of a learner’s ParamSet. The search spaces are stored in the mlr_tuning_spaces dictionary.

as.data.table(mlr_tuning_spaces)
                        key         learner n_values
 1:     classif.glmnet.rbv2  classif.glmnet        2
 2:       classif.kknn.rbv2    classif.kknn        1
 3:  classif.ranger.default  classif.ranger        3
 4:     classif.ranger.rbv2  classif.ranger        7
 5:   classif.rpart.default   classif.rpart        3
 6:      classif.rpart.rbv2   classif.rpart        4
 7:     classif.svm.default     classif.svm        4
 8:        classif.svm.rbv2     classif.svm        5
 9: classif.xgboost.default classif.xgboost        9
10:    classif.xgboost.rbv2 classif.xgboost       13
11:        regr.glmnet.rbv2     regr.glmnet        2
12:          regr.kknn.rbv2       regr.kknn        1
13:     regr.ranger.default     regr.ranger        3
14:        regr.ranger.rbv2     regr.ranger        6
15:      regr.rpart.default      regr.rpart        3
16:         regr.rpart.rbv2      regr.rpart        4
17:        regr.svm.default        regr.svm        4
18:           regr.svm.rbv2        regr.svm        5
19:    regr.xgboost.default    regr.xgboost        9
20:       regr.xgboost.rbv2    regr.xgboost       13

We can use the sugar function lts() to retrieve a TuningSpace.

tuning_space_rpart = lts("classif.rpart.default")

The $values slot contains the list of of TuneToken.

tuning_space_rpart$values
$minsplit
Tuning over:
range [2, 128] (log scale)


$minbucket
Tuning over:
range [1, 64] (log scale)


$cp
Tuning over:
range [1e-04, 0.1] (log scale)

We apply the search space and tune the learner.

learner = lrn("classif.rpart")

learner$param_set$values = tuning_space_rpart$values

instance = tune(
  method = "random_search",
  task = tsk("pima"),
  learner = learner,
  resampling = rsmp ("holdout"),
  measure = msr("classif.ce"),
  term_evals = 10)

instance$result
   minsplit minbucket        cp learner_param_vals  x_domain classif.ce
1: 1.377705  2.369973 -5.610915          <list[3]> <list[3]>  0.2265625

We can also get the learner with search space already applied from the TuningSpace.

learner = tuning_space_rpart$get_learner()
print(learner$param_set)
<ParamSet>
                id    class lower upper nlevels        default               value
 1:             cp ParamDbl     0     1     Inf           0.01 <RangeTuneToken[2]>
 2:     keep_model ParamLgl    NA    NA       2          FALSE                    
 3:     maxcompete ParamInt     0   Inf     Inf              4                    
 4:       maxdepth ParamInt     1    30      30             30                    
 5:   maxsurrogate ParamInt     0   Inf     Inf              5                    
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[3]> <RangeTuneToken[2]>
 7:       minsplit ParamInt     1   Inf     Inf             20 <RangeTuneToken[2]>
 8: surrogatestyle ParamInt     0     1       2              0                    
 9:   usesurrogate ParamInt     0     2       3              2                    
10:           xval ParamInt     0   Inf     Inf             10                   0

This method also allows to set constant parameters.

learner = tuning_space_rpart$get_learner(maxdepth = 15)
print(learner$param_set)
<ParamSet>
                id    class lower upper nlevels        default               value
 1:             cp ParamDbl     0     1     Inf           0.01 <RangeTuneToken[2]>
 2:     keep_model ParamLgl    NA    NA       2          FALSE                    
 3:     maxcompete ParamInt     0   Inf     Inf              4                    
 4:       maxdepth ParamInt     1    30      30             30                  15
 5:   maxsurrogate ParamInt     0   Inf     Inf              5                    
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[3]> <RangeTuneToken[2]>
 7:       minsplit ParamInt     1   Inf     Inf             20 <RangeTuneToken[2]>
 8: surrogatestyle ParamInt     0     1       2              0                    
 9:   usesurrogate ParamInt     0     2       3              2                    
10:           xval ParamInt     0   Inf     Inf             10                   0

The lts() function sets the default search space directly to a learner.

learner = lts(lrn("classif.rpart", maxdepth = 15))
print(learner$param_set)
<ParamSet>
                id    class lower upper nlevels        default               value
 1:             cp ParamDbl     0     1     Inf           0.01 <RangeTuneToken[2]>
 2:     keep_model ParamLgl    NA    NA       2          FALSE                    
 3:     maxcompete ParamInt     0   Inf     Inf              4                    
 4:       maxdepth ParamInt     1    30      30             30                  15
 5:   maxsurrogate ParamInt     0   Inf     Inf              5                    
 6:      minbucket ParamInt     1   Inf     Inf <NoDefault[3]> <RangeTuneToken[2]>
 7:       minsplit ParamInt     1   Inf     Inf             20 <RangeTuneToken[2]>
 8: surrogatestyle ParamInt     0     1       2              0                    
 9:   usesurrogate ParamInt     0     2       3              2                    
10:           xval ParamInt     0   Inf     Inf             10                   0

Citation

For attribution, please cite this work as

Becker (2021, July 6). mlr3gallery: Introduction to mlr3tuningspaces. Retrieved from https://mlr3gallery.mlr-org.com/posts/2021-07-06-tuningspaces/

BibTeX citation

@misc{becker2021introduction,
  author = {Becker, Marc},
  title = {mlr3gallery: Introduction to mlr3tuningspaces},
  url = {https://mlr3gallery.mlr-org.com/posts/2021-07-06-tuningspaces/},
  year = {2021}
}