Tuning a Complex Graph

mlr3tuning tuning optimization mlr3pipelines classification bst data set

We show how to tune a complex graph for a single task.

Lennart Schneider
02-03-2021

In this use case we show how to tune a rather complex graph consisting of different preprocessing steps and different learners where each preprocessing step and learner itself has parameters that can be tuned. You will learn the following:

Ideally you already had a look at how to tune over multiple learners.

First, we load the packages we will need:

Data and Task

We are going to work with some gene expression data included as a supplement in the bst package. The data consists of 2308 gene profiles in 63 training and 20 test samples. The following data preprocessing steps are done analogously as in vignette("khan", package = "bst"):

datafile = system.file("extdata", "supplemental_data", package = "bst")
dat0 = read.delim(datafile, header = TRUE, skip = 1)[, -(1:2)]
dat0 = t(dat0)
dat = data.frame(dat0[!(rownames(dat0) %in%
  c("TEST.9", "TEST.13", "TEST.5", "TEST.3", "TEST.11")), ])
dat$class = as.factor(
  c(substr(rownames(dat)[1:63], start = 1, stop = 2),
    c("NB", "RM", "NB", "EW", "RM", "BL", "EW", "RM", "EW", "EW", "EW", "RM",
      "BL", "RM", "NB", "NB", "NB", "NB", "BL", "EW")
  )
)

We then construct our training and test Task :

task = TaskClassif$new("SRBCT", backend = dat, target = "class")
task_train = task$clone(deep = TRUE)
task_train$filter(1:63)
task_test = task$clone(deep = TRUE)
task_test$filter(64:83)

Workflow

Our graph will start with log transforming the features, followed by scaling them. Then, either a PCA or ICA is applied to extract principal / independent components followed by fitting a LDA or a ranger random forest is fitted without any preprocessing (the log transformation and scaling should most likely affect the LDA more than the ranger random forest). Regarding the PCA and ICA, both the number of principal / independent components are tuning parameters. Regarding the LDA, we can further choose different methods for estimating the mean and variance and regarding the ranger, we want to tune the mtry and num.tree parameters. Note that the PCA-LDA combination has already been successfully applied in different cancer diagnostic contexts when the feature space is of high dimensionality (Morais and Lima 2018).

To allow for switching between the PCA / ICA-LDA and ranger we can either use branching or proxy pipelines, i.e., PipeOpBranch and PipeOpUnbranch or PipeOpProxy. We will first cover branching in detail and later show how the same can be done using PipeOpProxy.

Baseline

First, we have a look at the baseline classification accuracy of the LDA and ranger on the training task:

set.seed(1290)
base = benchmark(benchmark_grid(task_train,
  learners = list(lrn("classif.lda"), lrn("classif.ranger")),
  resamplings = rsmp("cv", folds = 3)))
base$aggregate(measures = msr("classif.acc"))
   nr      resample_result task_id     learner_id resampling_id iters
1:  1 <ResampleResult[21]>   SRBCT    classif.lda            cv     3
2:  2 <ResampleResult[21]>   SRBCT classif.ranger            cv     3
   classif.acc
1:   0.6190476
2:   0.9682540

The out-of-the-box ranger appears to already have good performance on the training task. Regarding the LDA, we do get a warning message that some features are colinear. This strongly suggests to reduce the dimensionality of the feature space. Let’s see if we can get some better performance, at least for the LDA.

Branching

Our graph starts with log transforming the features (we explicitly use base 10 only for better interpretability when inspecting the model later), using PipeOpColApply, followed by scaling the features using PipeOpScale. Then, the first branch allows for switching between the PCA / ICA-LDA and ranger, and within PCA / ICA-LDA, the second branch allows for switching between PCA and ICA:

graph1 =
  po("colapply", applicator = function(x) log(x, base = 10)) %>>%
  po("scale") %>>%
  # pca / ica followed by lda vs. ranger
  po("branch", id = "branch_learner", options = c("pca_ica_lda", "ranger")) %>>%
  gunion(list(
    po("branch", id = "branch_preproc_lda", options = c("pca", "ica")) %>>%
      gunion(list(
        po("pca"), po("ica")
      )) %>>%
      po("unbranch", id = "unbranch_preproc_lda") %>>%
      lrn("classif.lda"),
    lrn("classif.ranger")
  )) %>>%
  po("unbranch", id = "unbranch_learner")

Note that the names of the options within each branch are arbitrary, but ideally they describe what is happening. Therefore we go with "pca_ica_lda" / "ranger" and "pca" / "ica". Finally, we also could have used the branch ppl to make branching easier (we will come back to this in the Proxy section). The graph looks like the following:

graph1$plot()

We can inspect the parameters of the ParamSet of the graph to see which parameters can be set:

graph1$param_set$ids()
 [1] "colapply.applicator"                        
 [2] "colapply.affect_columns"                    
 [3] "scale.center"                               
 [4] "scale.scale"                                
 [5] "scale.robust"                               
 [6] "scale.affect_columns"                       
 [7] "branch_learner.selection"                   
 [8] "branch_preproc_lda.selection"               
 [9] "pca.center"                                 
[10] "pca.scale."                                 
[11] "pca.rank."                                  
[12] "pca.affect_columns"                         
[13] "ica.n.comp"                                 
[14] "ica.alg.typ"                                
[15] "ica.fun"                                    
[16] "ica.alpha"                                  
[17] "ica.method"                                 
[18] "ica.row.norm"                               
[19] "ica.maxit"                                  
[20] "ica.tol"                                    
[21] "ica.verbose"                                
[22] "ica.w.init"                                 
[23] "ica.affect_columns"                         
[24] "classif.lda.prior"                          
[25] "classif.lda.tol"                            
[26] "classif.lda.method"                         
[27] "classif.lda.nu"                             
[28] "classif.lda.predict.method"                 
[29] "classif.lda.predict.prior"                  
[30] "classif.lda.dimen"                          
[31] "classif.ranger.num.trees"                   
[32] "classif.ranger.mtry"                        
[33] "classif.ranger.importance"                  
[34] "classif.ranger.write.forest"                
[35] "classif.ranger.min.node.size"               
[36] "classif.ranger.replace"                     
[37] "classif.ranger.sample.fraction"             
[38] "classif.ranger.class.weights"               
[39] "classif.ranger.splitrule"                   
[40] "classif.ranger.num.random.splits"           
[41] "classif.ranger.split.select.weights"        
[42] "classif.ranger.always.split.variables"      
[43] "classif.ranger.respect.unordered.factors"   
[44] "classif.ranger.scale.permutation.importance"
[45] "classif.ranger.keep.inbag"                  
[46] "classif.ranger.holdout"                     
[47] "classif.ranger.num.threads"                 
[48] "classif.ranger.save.memory"                 
[49] "classif.ranger.verbose"                     
[50] "classif.ranger.oob.error"                   
[51] "classif.ranger.max.depth"                   
[52] "classif.ranger.alpha"                       
[53] "classif.ranger.min.prop"                    
[54] "classif.ranger.regularization.factor"       
[55] "classif.ranger.regularization.usedepth"     
[56] "classif.ranger.seed"                        
[57] "classif.ranger.minprop"                     
[58] "classif.ranger.se.method"                   

The id’s are prefixed by the respective PipeOp they belong to, e.g., pca.rank. refers to the rank. parameter of PipeOpPCA.

Search Space

Our graph either fits a LDA after applying PCA or ICA, or alternatively a ranger with no preprocessing. These two options each define selection parameters that we can tune. Moreover, within the respective PipeOp’s we want to tune the following parameters: pca.rank., ica.n.comp, classif.lda.method, classif.ranger.mtry, and classif.ranger.num.trees. The first two parameters are integers that in-principal could range from 1 to the number of features. However, for ICA, the upper bound must not exceed the number of observations and as we will later use 3-fold cross-validation as the resampling method for the tuning, we just set the upper bound to 30 (and do the same for PCA). Regarding the classif.lda.method we will only be interested in "moment" estimation vs. minimum volume ellipsoid covariance estimation ("mve"). Moreover, we set the lower bound of classif.ranger.mtry to 200 (which is around the number of features divided by 10) and the upper bound to 1000.

tune_ps1 = ParamSet$new(list(
  ParamFct$new("branch_learner.selection", levels = c("pca_ica_lda", "ranger")),
  ParamFct$new("branch_preproc_lda.selection", levels = c("pca", "ica")),
  ParamInt$new("pca.rank.", lower = 1, upper = 30),
  ParamInt$new("ica.n.comp", lower = 1, upper = 30),
  ParamFct$new("classif.lda.method", levels = c("moment", "mve")),
  ParamInt$new("classif.ranger.mtry", lower = 200, upper = 1000),
  ParamInt$new("classif.ranger.num.trees", lower = 500, upper = 2000))
)

The parameter branch_learner.selection defines whether we go down the left (PCA / ICA followed by LDA) or the right branch (ranger). The parameter branch_preproc_lda.selection defines whether a PCA or ICA will be applied prior to the LDA. The other parameters directly belong to the ParamSet of the PCA / ICA / LDA / ranger. Note that it only makes sense to switch between PCA / ICA if the "pca_ica_lda" branch was selected beforehand. We have to specify this via:

tune_ps1$add_dep("branch_preproc_lda.selection",
  on = "branch_learner.selection",
  cond = CondEqual$new("pca_ica_lda"))

Again, regarding the pca.rank. parameter, there is a dependency on that "pca" must have been selected in branch_preproc_lda.selection beforehand, which we have to explicitly specify:

tune_ps1$add_dep("pca.rank.",
  on = "branch_preproc_lda.selection",
  cond = CondEqual$new("pca"))

The same holds for the following parameters analogously:

tune_ps1$add_dep("ica.n.comp",
  on = "branch_preproc_lda.selection",
  cond = CondEqual$new("ica"))
tune_ps1$add_dep("classif.lda.method",
  on = "branch_preproc_lda.selection",
  cond = CondEqual$new("ica"))
tune_ps1$add_dep("classif.ranger.mtry",
  on = "branch_learner.selection",
  cond = CondEqual$new("ranger"))
tune_ps1$add_dep("classif.ranger.num.trees",
  on = "branch_learner.selection",
  cond = CondEqual$new("ranger"))

Finally, we also could have proceeded to tune the numeric parameters on a log scale. I.e., looking at pca.rank. the performance difference between rank 1 and 2 is probably much larger than between rank 29 and rank 30. The mlr3tuning Tutorial covers such transformations.

Tuning

We can now tune the parameters of our graph as defined in the search space with respect to a measure. We will use the classification accuracy. As a resampling method we use 3-fold cross-validation. We will use the TerminatorNone (i.e., no early termination) for terminating the tuning because we will apply a grid search (we use a grid search because it gives nicely plottable and understandable results but if there were much more parameters, random search or more intelligent optimization methods would be preferred to a grid search:

set.seed(2409)
tune1 = TuningInstanceSingleCrit$new(
  task_train,
  learner = graph1,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.acc"),
  search_space = tune_ps1,
  terminator = trm("none")
)

We then perform a grid search using a resolution of 4 for the numeric parameters. The grid being used will look like the following (note that the dependencies we specified above are handled automatically):

generate_design_grid(tune_ps1, resolution = 4)
<Design> with 28 rows:
    branch_learner.selection branch_preproc_lda.selection pca.rank. ica.n.comp
 1:              pca_ica_lda                          pca         1         NA
 2:              pca_ica_lda                          pca        10         NA
 3:              pca_ica_lda                          pca        20         NA
 4:              pca_ica_lda                          pca        30         NA
 5:              pca_ica_lda                          ica        NA          1
 6:              pca_ica_lda                          ica        NA          1
 7:              pca_ica_lda                          ica        NA         10
 8:              pca_ica_lda                          ica        NA         10
 9:              pca_ica_lda                          ica        NA         20
10:              pca_ica_lda                          ica        NA         20
11:              pca_ica_lda                          ica        NA         30
12:              pca_ica_lda                          ica        NA         30
13:                   ranger                         <NA>        NA         NA
14:                   ranger                         <NA>        NA         NA
15:                   ranger                         <NA>        NA         NA
16:                   ranger                         <NA>        NA         NA
17:                   ranger                         <NA>        NA         NA
18:                   ranger                         <NA>        NA         NA
19:                   ranger                         <NA>        NA         NA
20:                   ranger                         <NA>        NA         NA
21:                   ranger                         <NA>        NA         NA
22:                   ranger                         <NA>        NA         NA
23:                   ranger                         <NA>        NA         NA
24:                   ranger                         <NA>        NA         NA
25:                   ranger                         <NA>        NA         NA
26:                   ranger                         <NA>        NA         NA
27:                   ranger                         <NA>        NA         NA
28:                   ranger                         <NA>        NA         NA
    branch_learner.selection branch_preproc_lda.selection pca.rank. ica.n.comp
    classif.lda.method classif.ranger.mtry classif.ranger.num.trees
 1:               <NA>                  NA                       NA
 2:               <NA>                  NA                       NA
 3:               <NA>                  NA                       NA
 4:               <NA>                  NA                       NA
 5:             moment                  NA                       NA
 6:                mve                  NA                       NA
 7:             moment                  NA                       NA
 8:                mve                  NA                       NA
 9:             moment                  NA                       NA
10:                mve                  NA                       NA
11:             moment                  NA                       NA
12:                mve                  NA                       NA
13:               <NA>                 200                      500
14:               <NA>                 200                     1000
15:               <NA>                 200                     1500
16:               <NA>                 200                     2000
17:               <NA>                 466                      500
18:               <NA>                 466                     1000
19:               <NA>                 466                     1500
20:               <NA>                 466                     2000
21:               <NA>                 733                      500
22:               <NA>                 733                     1000
23:               <NA>                 733                     1500
24:               <NA>                 733                     2000
25:               <NA>                1000                      500
26:               <NA>                1000                     1000
27:               <NA>                1000                     1500
28:               <NA>                1000                     2000
    classif.lda.method classif.ranger.mtry classif.ranger.num.trees

Before starting the tuning we set some logging thresholds (i.e., only print warnings on the console):

lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")
tuner_gs = tnr("grid_search", resolution = 4)
tuner_gs$optimize(tune1)
   branch_learner.selection branch_preproc_lda.selection pca.rank. ica.n.comp
1:                   ranger                         <NA>        NA         NA
   classif.lda.method classif.ranger.mtry classif.ranger.num.trees
1:               <NA>                 466                     1500
   learner_param_vals  x_domain classif.acc
1:          <list[8]> <list[3]>           1

Now, we can inspect the results ordered by the classification accuracy:

tune1_results = tune1$archive$data
tune1_results[order(classif.acc), ]
    branch_learner.selection branch_preproc_lda.selection pca.rank. ica.n.comp
 1:              pca_ica_lda                          ica        NA          1
 2:              pca_ica_lda                          pca         1         NA
 3:              pca_ica_lda                          ica        NA          1
 4:              pca_ica_lda                          ica        NA         10
 5:              pca_ica_lda                          ica        NA         10
 6:              pca_ica_lda                          pca        10         NA
 7:              pca_ica_lda                          ica        NA         20
 8:              pca_ica_lda                          ica        NA         20
 9:              pca_ica_lda                          pca        20         NA
10:              pca_ica_lda                          pca        30         NA
11:              pca_ica_lda                          ica        NA         30
12:              pca_ica_lda                          ica        NA         30
13:                   ranger                         <NA>        NA         NA
14:                   ranger                         <NA>        NA         NA
15:                   ranger                         <NA>        NA         NA
16:                   ranger                         <NA>        NA         NA
17:                   ranger                         <NA>        NA         NA
18:                   ranger                         <NA>        NA         NA
19:                   ranger                         <NA>        NA         NA
20:                   ranger                         <NA>        NA         NA
21:                   ranger                         <NA>        NA         NA
22:                   ranger                         <NA>        NA         NA
23:                   ranger                         <NA>        NA         NA
24:                   ranger                         <NA>        NA         NA
25:                   ranger                         <NA>        NA         NA
26:                   ranger                         <NA>        NA         NA
27:                   ranger                         <NA>        NA         NA
28:                   ranger                         <NA>        NA         NA
    branch_learner.selection branch_preproc_lda.selection pca.rank. ica.n.comp
    classif.lda.method classif.ranger.mtry classif.ranger.num.trees classif.acc
 1:             moment                  NA                       NA   0.2380952
 2:               <NA>                  NA                       NA   0.2380952
 3:                mve                  NA                       NA   0.2698413
 4:                mve                  NA                       NA   0.8571429
 5:             moment                  NA                       NA   0.8730159
 6:               <NA>                  NA                       NA   0.8730159
 7:                mve                  NA                       NA   0.8888889
 8:             moment                  NA                       NA   0.9682540
 9:               <NA>                  NA                       NA   0.9682540
10:               <NA>                  NA                       NA   0.9841270
11:                mve                  NA                       NA   0.9841270
12:             moment                  NA                       NA   0.9841270
13:               <NA>                 466                     1500   1.0000000
14:               <NA>                1000                     1500   1.0000000
15:               <NA>                 200                     1000   1.0000000
16:               <NA>                 733                      500   1.0000000
17:               <NA>                1000                     1000   1.0000000
18:               <NA>                 200                      500   1.0000000
19:               <NA>                 466                      500   1.0000000
20:               <NA>                 200                     1500   1.0000000
21:               <NA>                 733                     2000   1.0000000
22:               <NA>                1000                      500   1.0000000
23:               <NA>                 733                     1000   1.0000000
24:               <NA>                 466                     1000   1.0000000
25:               <NA>                 200                     2000   1.0000000
26:               <NA>                 466                     2000   1.0000000
27:               <NA>                 733                     1500   1.0000000
28:               <NA>                1000                     2000   1.0000000
    classif.lda.method classif.ranger.mtry classif.ranger.num.trees classif.acc
                                   uhash  x_domain           timestamp batch_nr
 1: 31ef672b-0d06-448e-a39a-4d239cd1bee7 <list[4]> 2021-04-17 05:02:15        3
 2: 6f1f7c3d-1355-48b9-92b4-db053646b2bb <list[3]> 2021-04-17 05:15:48       24
 3: 4eb06862-43ee-4cbe-99c6-2ce6265462dd <list[4]> 2021-04-17 05:09:15       15
 4: 344ff380-6fa1-4711-817e-66f6dfda7d17 <list[4]> 2021-04-17 05:12:47       20
 5: c90cb73c-78ec-408c-9b43-e740b4ef9328 <list[4]> 2021-04-17 04:59:34        2
 6: 2ff7382c-7554-4304-8477-55fcd1f3fda4 <list[3]> 2021-04-17 05:03:08       10
 7: 5fbeb2e4-0266-4e5b-8f49-e057b95775b7 <list[4]> 2021-04-17 05:15:31       21
 8: 281cd01d-9437-46c9-90a2-194a0656b09b <list[4]> 2021-04-17 04:56:59        1
 9: 7523bc9e-3b43-4113-a872-86709a82de3d <list[3]> 2021-04-17 05:03:12       11
10: da0ea631-19b0-4823-90d2-6382ef63de6f <list[3]> 2021-04-17 05:02:47        7
11: e701795f-2f55-45d4-bb52-5c8b7ebd5b96 <list[4]> 2021-04-17 05:06:08       12
12: 9ac6e3d1-6224-4009-8a97-c8296cd04d98 <list[4]> 2021-04-17 05:18:41       26
13: bb5241fd-51e1-4336-9057-f706b36f74c5 <list[3]> 2021-04-17 05:02:24        4
14: 399bd877-6428-434b-bdb7-9f31b887fd99 <list[3]> 2021-04-17 05:02:38        5
15: 474f2dae-bfaf-495d-96e2-95cb52357728 <list[3]> 2021-04-17 05:02:44        6
16: aa7cb84b-9103-42b2-8fd2-375e952cb990 <list[3]> 2021-04-17 05:02:54        8
17: e856077d-5ad6-45d7-944a-81cd6cb7632b <list[3]> 2021-04-17 05:03:05        9
18: e1a047eb-b182-42de-9c27-379b850c51c6 <list[3]> 2021-04-17 05:06:12       13
19: c8e0665c-7a12-46b2-8eb1-affce47d9b40 <list[3]> 2021-04-17 05:06:19       14
20: 6e3cace4-27d5-4dfa-a10f-a97556aa41ca <list[3]> 2021-04-17 05:09:21       16
21: 7fe650dc-ca0a-4915-baea-6dc4d38977dd <list[3]> 2021-04-17 05:09:36       17
22: 597aa1de-9232-4e53-b901-bf0075945fb2 <list[3]> 2021-04-17 05:09:44       18
23: 46aa62cd-3575-48c6-aa81-c4c5dcc27269 <list[3]> 2021-04-17 05:09:53       19
24: f421df2b-3c48-4ada-a5cc-4077b61d7acb <list[3]> 2021-04-17 05:15:38       22
25: 0096bd1f-0f06-4efb-b293-651cbf55cce3 <list[3]> 2021-04-17 05:15:45       23
26: e3a3fc3b-6f8c-4009-94e7-b2695371611a <list[3]> 2021-04-17 05:15:59       25
27: 7c2003a0-a224-4af8-862e-d351d03712e7 <list[3]> 2021-04-17 05:18:52       27
28: 79c48172-231c-4643-bba9-986f590c9c22 <list[3]> 2021-04-17 05:19:09       28
                                   uhash  x_domain           timestamp batch_nr

We achieve very good accuracy using ranger, more or less regardless how mtry and num.trees are set. However, the LDA also shows very good accuracy when combined with PCA or ICA retaining 30 components.

For now, we decide to use ranger with mtry set to 200 and num.trees set to 1000.

Setting these parameters manually in our graph, then training on the training task and predicting on the test task yields an accuracy of:

graph1$param_set$values$branch_learner.selection = "ranger"
graph1$param_set$values$classif.ranger.mtry = 200
graph1$param_set$values$classif.ranger.num.trees = 1000
graph1$train(task_train)
$unbranch_learner.output
NULL
graph1$predict(task_test)[[1L]]$score(msr("classif.acc"))
classif.acc 
          1 

Note that we also could have wrapped our graph in a GraphLearner and proceeded to use this as a learner in an AutoTuner.

Proxy

Instead of using branches to split our graph with respect to the learner and preprocessing options, we can also use PipeOpProxy. PipeOpProxy accepts a single content parameter that can contain any other PipeOp or Graph. This is extremely flexible in the sense that we do not have to specify our options during construction. However, the parameters of the contained PipeOp or Graph are no longer directly contained in the ParamSet of the resulting graph. Therefore, when tuning the graph, we do have to make use of a trafo function.

graph2 =
  po("colapply", applicator = function(x) log(x, base = 10)) %>>%
  po("scale") %>>%
  po("proxy")

This graph now looks like the following:

graph2$plot()

At first, this may look like a linear graph. However, as the content parameter of PipeOpProxy can be tuned and set to contain any other PipeOp or Graph, this will allow for a similar non-linear graph as when doing branching.

graph2$param_set$ids()
[1] "colapply.applicator"     "colapply.affect_columns"
[3] "scale.center"            "scale.scale"            
[5] "scale.robust"            "scale.affect_columns"   
[7] "proxy.content"          

We can tune the graph by using the same search space as before. However, here the trafo function is of central importance to actually set our options and parameters:

tune_ps2 = tune_ps1$clone(deep = TRUE)

The trafo function does all the work, i.e., selecting either the PCA / ICA-LDA or ranger as the proxy.content as well as setting the parameters of the respective preprocessing PipeOps and Learners.

proxy_options = list(
  pca_ica_lda =
    ppl("branch", graphs = list(pca = po("pca"), ica = po("ica"))) %>>%
      lrn("classif.lda"),
  ranger = lrn("classif.ranger")
)

Above, we made use of the branch ppl allowing us to easily construct a branching graph. Of course we also could have use another nested PipeOpProxy to specify the preprocessing options ("pca" vs. "ica") within proxy_options if for some reason we do not want to do branching at all. The trafo function below selects one of the proxy_options from above and sets the respective parameters for the PCA, ICA, LDA and ranger. Here, the argument x is a list which will contain sampled / selected parameters from our ParamSet (in our case, tune_ps2). The return value is a list only including the appropriate proxy.content parameter. In each tuning iteration, the proxy.content parameter of our graph will be set to this value.

tune_ps2$trafo = function(x, param_set) {
  proxy.content = proxy_options[[x$branch_learner.selection]]
  if (x$branch_learner.selection == "pca_ica_lda") {
    # pca_ica_lda
    proxy.content$param_set$values$branch.selection = x$branch_preproc_lda.selection
    if (x$branch_preproc_lda.selection == "pca") {
      proxy.content$param_set$values$pca.rank. = x$pca.rank.
    } else {
      proxy.content$param_set$values$ica.n.comp = x$ica.n.comp
    }
    proxy.content$param_set$values$classif.lda.method = x$classif.lda.method
  } else {
    # ranger
    proxy.content$param_set$values$mtry = x$classif.ranger.mtry
    proxy.content$param_set$values$num.trees = x$classif.ranger.num.trees
  }
  list(proxy.content = proxy.content)
}

I.e., suppose that the following parameters will be selected from our ParamSet:

x = list(
  branch_learner.selection = "ranger",
  classif.ranger.mtry = 200,
  classif.ranger.num.trees = 500)

The trafo function will then return:

tune_ps2$trafo(x)
$proxy.content
<LearnerClassifRanger:classif.ranger>
* Model: -
* Parameters: num.threads=1, mtry=200, num.trees=500
* Packages: ranger
* Predict Type: response
* Feature types: logical, integer, numeric, character, factor, ordered
* Properties: importance, multiclass, oob_error, twoclass, weights

Tuning can be carried out analogously as done above:

set.seed(2409)
tune2 = TuningInstanceSingleCrit$new(
  task_train,
  learner = graph2,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.acc"),
  search_space = tune_ps2,
  terminator = trm("none")
)
tuner_gs$optimize(tune2)
tune2_results = tune2$archive$data
tune2_results[order(classif.acc), ]

Morais, Camilo LM, and Kássio MG Lima. 2018. “Principal Component Analysis with Linear and Quadratic Discriminant Analysis for Identification of Cancer Samples Based on Mass Spectrometry.” Journal of the Brazilian Chemical Society 29 (3): 472–81. https://doi.org/10.21577/0103-5053.20170159.

References

Citation

For attribution, please cite this work as

Schneider (2021, Feb. 3). mlr3gallery: Tuning a Complex Graph. Retrieved from https://mlr3gallery.mlr-org.com/posts/2021-02-03-tuning-a-complex-graph/

BibTeX citation

@misc{schneider2021tuning,
  author = {Schneider, Lennart},
  title = {mlr3gallery: Tuning a Complex Graph},
  url = {https://mlr3gallery.mlr-org.com/posts/2021-02-03-tuning-a-complex-graph/},
  year = {2021}
}