We show how to tune a complex graph for a single task.

In this use case we show how to tune a rather complex graph consisting of different preprocessing steps and different learners where each preprocessing step and learner itself has parameters that can be tuned. You will learn the following:

- Build a
`Graph`

that consists of two common preprocessing steps, then switches between two dimensionality reduction techniques followed by a`Learner`

vs. no dimensionality reduction followed by another`Learner`

- Define the search space for tuning that handles inter-dependencies between pipeline steps and hyperparameters
- Run a
`grid search`

to find an optimal choice of preprocessing steps and hyperparameters.

Ideally you already had a look at how to tune over multiple learners.

First, we load the packages we will need:

We initialize the random number generator with a fixed seed for reproducibility, and decrease the verbosity of the logger to keep the output clearly represented. The `lgr`

package is used for logging in all mlr3 packages. The mlr3 logger prints the logging messages from the base package, whereas the bbotk logger is responsible for logging messages from the optimization packages (e.g. mlr3tuning ).

```
set.seed(7832)
lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")
```

We are going to work with some gene expression data included as a supplement in the bst package. The data consists of 2308 gene profiles in 63 training and 20 test samples. The following data preprocessing steps are done analogously as in `vignette("khan", package = "bst")`

:

```
datafile = system.file("extdata", "supplemental_data", package = "bst")
dat0 = read.delim(datafile, header = TRUE, skip = 1)[, -(1:2)]
dat0 = t(dat0)
dat = data.frame(dat0[!(rownames(dat0) %in%
c("TEST.9", "TEST.13", "TEST.5", "TEST.3", "TEST.11")), ])
dat$class = as.factor(
c(substr(rownames(dat)[1:63], start = 1, stop = 2),
c("NB", "RM", "NB", "EW", "RM", "BL", "EW", "RM", "EW", "EW", "EW", "RM",
"BL", "RM", "NB", "NB", "NB", "NB", "BL", "EW")
)
)
```

We then construct our training and test `Task`

:

```
task = as_task_classif(dat, target = "class", id = "SRBCT")
task_train = task$clone(deep = TRUE)
task_train$filter(1:63)
task_test = task$clone(deep = TRUE)
task_test$filter(64:83)
```

Our graph will start with log transforming the features, followed by scaling them. Then, either a `PCA`

or `ICA`

is applied to extract principal / independent components followed by fitting a `LDA`

or a `ranger random forest`

is fitted without any preprocessing (the log transformation and scaling should most likely affect the `LDA`

more than the `ranger random forest`

). Regarding the `PCA`

and `ICA`

, both the number of principal / independent components are tuning parameters. Regarding the `LDA`

, we can further choose different methods for estimating the mean and variance and regarding the `ranger`

, we want to tune the `mtry`

and `num.tree`

parameters. Note that the `PCA-LDA`

combination has already been successfully applied in different cancer diagnostic contexts when the feature space is of high dimensionality (Morais and Lima 2018).

To allow for switching between the `PCA`

/ `ICA`

-`LDA`

and `ranger`

we can either use branching or proxy pipelines, i.e., `PipeOpBranch`

and `PipeOpUnbranch`

or `PipeOpProxy`

. We will first cover branching in detail and later show how the same can be done using `PipeOpProxy`

.

First, we have a look at the baseline `classification accuracy`

of the `LDA`

and `ranger`

on the training task:

```
base = benchmark(benchmark_grid(
task_train,
learners = list(lrn("classif.lda"), lrn("classif.ranger")),
resamplings = rsmp("cv", folds = 3)))
base$aggregate(measures = msr("classif.acc"))
```

```
nr resample_result task_id learner_id resampling_id iters classif.acc
1: 1 <ResampleResult[20]> SRBCT classif.lda cv 3 0.6666667
2: 2 <ResampleResult[20]> SRBCT classif.ranger cv 3 0.9206349
```

The out-of-the-box `ranger`

appears to already have good performance on the training task. Regarding the `LDA`

, we do get a warning message that some features are colinear. This strongly suggests to reduce the dimensionality of the feature space. Let’s see if we can get some better performance, at least for the `LDA`

.

Our graph starts with log transforming the features (we explicitly use base 10 only for better interpretability when inspecting the model later), using `PipeOpColApply`

, followed by scaling the features using `PipeOpScale`

. Then, the first branch allows for switching between the `PCA`

/ `ICA`

-`LDA`

and `ranger`

, and within `PCA`

/ `ICA`

-`LDA`

, the second branch allows for switching between `PCA`

and `ICA`

:

```
graph1 =
po("colapply", applicator = function(x) log(x, base = 10)) %>>%
po("scale") %>>%
# pca / ica followed by lda vs. ranger
po("branch", id = "branch_learner", options = c("pca_ica_lda", "ranger")) %>>%
gunion(list(
po("branch", id = "branch_preproc_lda", options = c("pca", "ica")) %>>%
gunion(list(
po("pca"), po("ica")
)) %>>%
po("unbranch", id = "unbranch_preproc_lda") %>>%
lrn("classif.lda"),
lrn("classif.ranger")
)) %>>%
po("unbranch", id = "unbranch_learner")
```

Note that the names of the options within each branch are arbitrary, but ideally they describe what is happening. Therefore we go with `"pca_ica_lda"`

/ `"ranger`

" and `"pca"`

/ `"ica"`

. Finally, we also could have used the `branch`

`ppl`

to make branching easier (we will come back to this in the Proxy section). The graph looks like the following:

```
graph1$plot()
```

We can inspect the parameters of the `ParamSet`

of the graph to see which parameters can be set:

```
graph1$param_set$ids()
```

```
[1] "colapply.applicator" "colapply.affect_columns"
[3] "scale.center" "scale.scale"
[5] "scale.robust" "scale.affect_columns"
[7] "branch_learner.selection" "branch_preproc_lda.selection"
[9] "pca.center" "pca.scale."
[11] "pca.rank." "pca.affect_columns"
[13] "ica.n.comp" "ica.alg.typ"
[15] "ica.fun" "ica.alpha"
[17] "ica.method" "ica.row.norm"
[19] "ica.maxit" "ica.tol"
[21] "ica.verbose" "ica.w.init"
[23] "ica.affect_columns" "classif.lda.dimen"
[25] "classif.lda.method" "classif.lda.nu"
[27] "classif.lda.predict.method" "classif.lda.predict.prior"
[29] "classif.lda.prior" "classif.lda.tol"
[31] "classif.ranger.alpha" "classif.ranger.always.split.variables"
[33] "classif.ranger.class.weights" "classif.ranger.holdout"
[35] "classif.ranger.importance" "classif.ranger.keep.inbag"
[37] "classif.ranger.max.depth" "classif.ranger.min.node.size"
[39] "classif.ranger.min.prop" "classif.ranger.minprop"
[41] "classif.ranger.mtry" "classif.ranger.mtry.ratio"
[43] "classif.ranger.num.random.splits" "classif.ranger.num.threads"
[45] "classif.ranger.num.trees" "classif.ranger.oob.error"
[47] "classif.ranger.regularization.factor" "classif.ranger.regularization.usedepth"
[49] "classif.ranger.replace" "classif.ranger.respect.unordered.factors"
[51] "classif.ranger.sample.fraction" "classif.ranger.save.memory"
[53] "classif.ranger.scale.permutation.importance" "classif.ranger.se.method"
[55] "classif.ranger.seed" "classif.ranger.split.select.weights"
[57] "classif.ranger.splitrule" "classif.ranger.verbose"
[59] "classif.ranger.write.forest"
```

The `id`

’s are prefixed by the respective `PipeOp`

they belong to, e.g., `pca.rank.`

refers to the `rank.`

parameter of `PipeOpPCA`

.

Our graph either fits a `LDA`

after applying `PCA`

or `ICA`

, or alternatively a `ranger`

with no preprocessing. These two **options** each define selection parameters that we can tune. Moreover, within the respective `PipeOp`

’s we want to tune the following parameters: `pca.rank.`

, `ica.n.comp`

, `classif.lda.method`

, `classif.ranger.mtry`

, and `classif.ranger.num.trees`

. The first two parameters are integers that in-principal could range from 1 to the number of features. However, for `ICA`

, the upper bound must not exceed the number of observations and as we will later use `3-fold`

`cross-validation`

as the resampling method for the tuning, we just set the upper bound to 30 (and do the same for `PCA`

). Regarding the `classif.lda.method`

we will only be interested in `"moment"`

estimation vs. minimum volume ellipsoid covariance estimation (`"mve"`

). Moreover, we set the lower bound of `classif.ranger.mtry`

to 200 (which is around the number of features divided by 10) and the upper bound to 1000.

```
tune_ps1 = ps(
branch_learner.selection =
p_fct(c("pca_ica_lda", "ranger")),
branch_preproc_lda.selection =
p_fct(c("pca", "ica"), depends = branch_learner.selection == "pca_ica_lda"),
pca.rank. =
p_int(1, 30, depends = branch_preproc_lda.selection == "pca"),
ica.n.comp =
p_int(1, 30, depends = branch_preproc_lda.selection == "ica"),
classif.lda.method =
p_fct(c("moment", "mve"), depends = branch_preproc_lda.selection == "ica"),
classif.ranger.mtry =
p_int(200, 1000, depends = branch_learner.selection == "ranger"),
classif.ranger.num.trees =
p_int(500, 2000, depends = branch_learner.selection == "ranger"))
```

The parameter `branch_learner.selection`

defines whether we go down the left (`PCA`

/ `ICA`

followed by `LDA`

) or the right branch (`ranger`

). The parameter `branch_preproc_lda.selection`

defines whether a `PCA`

or `ICA`

will be applied prior to the `LDA`

. The other parameters directly belong to the `ParamSet`

of the `PCA`

/ `ICA`

/ `LDA`

/ `ranger`

. Note that it only makes sense to switch between `PCA`

/ `ICA`

if the `"pca_ica_lda"`

branch was selected beforehand. We have to specify this via the `depends`

parameter.

Finally, we also could have proceeded to tune the numeric parameters on a log scale. I.e., looking at `pca.rank.`

the performance difference between rank 1 and 2 is probably much larger than between rank 29 and rank 30. The mlr3tuning Tutorial covers such transformations.

We can now tune the parameters of our graph as defined in the search space with respect to a measure. We will use the `classification accuracy`

. As a resampling method we use `3-fold cross-validation`

. We will use the `TerminatorNone`

(i.e., no early termination) for terminating the tuning because we will apply a `grid search`

(we use a `grid search`

because it gives nicely plottable and understandable results but if there were much more parameters, `random search`

or more intelligent optimization methods would be preferred to a `grid search`

:

```
tune1 = TuningInstanceSingleCrit$new(
task_train,
learner = graph1,
resampling = rsmp("cv", folds = 3),
measure = msr("classif.acc"),
search_space = tune_ps1,
terminator = trm("none")
)
```

We then perform a `grid search`

using a resolution of 4 for the numeric parameters. The grid being used will look like the following (note that the dependencies we specified above are handled automatically):

```
generate_design_grid(tune_ps1, resolution = 4)
```

branch_learner.selection | branch_preproc_lda.selection | pca.rank. | ica.n.comp | classif.lda.method | classif.ranger.mtry | classif.ranger.num.trees |
---|---|---|---|---|---|---|

pca_ica_lda | pca | 1 | NA | NA | NA | NA |

pca_ica_lda | pca | 10 | NA | NA | NA | NA |

pca_ica_lda | pca | 20 | NA | NA | NA | NA |

pca_ica_lda | pca | 30 | NA | NA | NA | NA |

pca_ica_lda | ica | NA | 1 | moment | NA | NA |

pca_ica_lda | ica | NA | 1 | mve | NA | NA |

pca_ica_lda | ica | NA | 10 | moment | NA | NA |

pca_ica_lda | ica | NA | 10 | mve | NA | NA |

pca_ica_lda | ica | NA | 20 | moment | NA | NA |

pca_ica_lda | ica | NA | 20 | mve | NA | NA |

pca_ica_lda | ica | NA | 30 | moment | NA | NA |

pca_ica_lda | ica | NA | 30 | mve | NA | NA |

ranger | NA | NA | NA | NA | 200 | 500 |

ranger | NA | NA | NA | NA | 200 | 1000 |

ranger | NA | NA | NA | NA | 200 | 1500 |

ranger | NA | NA | NA | NA | 200 | 2000 |

ranger | NA | NA | NA | NA | 466 | 500 |

ranger | NA | NA | NA | NA | 466 | 1000 |

ranger | NA | NA | NA | NA | 466 | 1500 |

ranger | NA | NA | NA | NA | 466 | 2000 |

ranger | NA | NA | NA | NA | 733 | 500 |

ranger | NA | NA | NA | NA | 733 | 1000 |

ranger | NA | NA | NA | NA | 733 | 1500 |

ranger | NA | NA | NA | NA | 733 | 2000 |

ranger | NA | NA | NA | NA | 1000 | 500 |

ranger | NA | NA | NA | NA | 1000 | 1000 |

ranger | NA | NA | NA | NA | 1000 | 1500 |

ranger | NA | NA | NA | NA | 1000 | 2000 |

We trigger the tuning.

```
tuner_gs = tnr("grid_search", resolution = 4, batch_size = 10)
tuner_gs$optimize(tune1)
```

Now, we can inspect the results ordered by the `classification accuracy`

:

```
as.data.table(tune1$archive)[order(classif.acc), ]
```

branch_learner.selection | branch_preproc_lda.selection | pca.rank. | ica.n.comp | classif.lda.method | classif.ranger.mtry | classif.ranger.num.trees | classif.acc | runtime_learners | timestamp | batch_nr |
---|---|---|---|---|---|---|---|---|---|---|

pca_ica_lda | pca | 1 | NA | NA | NA | NA | 0.2380952 | 4.443 | 2021-06-13 16:01:32 | 3 |

pca_ica_lda | ica | NA | 1 | moment | NA | NA | 0.2380952 | 219.277 | 2021-06-13 16:01:32 | 3 |

pca_ica_lda | ica | NA | 1 | mve | NA | NA | 0.2698413 | 220.355 | 2021-06-13 16:01:32 | 3 |

pca_ica_lda | pca | 10 | NA | NA | NA | NA | 0.8730159 | 8.301 | 2021-06-13 15:58:44 | 1 |

pca_ica_lda | ica | NA | 10 | moment | NA | NA | 0.8730159 | 220.568 | 2021-06-13 16:00:07 | 2 |

pca_ica_lda | ica | NA | 10 | mve | NA | NA | 0.8730159 | 217.173 | 2021-06-13 16:00:07 | 2 |

pca_ica_lda | ica | NA | 20 | mve | NA | NA | 0.9365079 | 222.402 | 2021-06-13 16:01:32 | 3 |

pca_ica_lda | ica | NA | 30 | mve | NA | NA | 0.9365079 | 222.528 | 2021-06-13 16:01:32 | 3 |

pca_ica_lda | ica | NA | 20 | moment | NA | NA | 0.9682540 | 208.040 | 2021-06-13 15:58:44 | 1 |

pca_ica_lda | pca | 20 | NA | NA | NA | NA | 0.9682540 | 3.685 | 2021-06-13 16:00:07 | 2 |

pca_ica_lda | pca | 30 | NA | NA | NA | NA | 0.9841270 | 7.091 | 2021-06-13 15:58:44 | 1 |

pca_ica_lda | ica | NA | 30 | moment | NA | NA | 0.9841270 | 217.277 | 2021-06-13 16:00:07 | 2 |

ranger | NA | NA | NA | NA | 1000 | 500 | 0.9841270 | 7.548 | 2021-06-13 16:01:32 | 3 |

ranger | NA | NA | NA | NA | 200 | 500 | 1.0000000 | 7.063 | 2021-06-13 15:58:44 | 1 |

ranger | NA | NA | NA | NA | 200 | 1500 | 1.0000000 | 10.526 | 2021-06-13 15:58:44 | 1 |

ranger | NA | NA | NA | NA | 466 | 1000 | 1.0000000 | 11.615 | 2021-06-13 15:58:44 | 1 |

ranger | NA | NA | NA | NA | 466 | 1500 | 1.0000000 | 10.945 | 2021-06-13 15:58:44 | 1 |

ranger | NA | NA | NA | NA | 466 | 2000 | 1.0000000 | 12.719 | 2021-06-13 15:58:44 | 1 |

ranger | NA | NA | NA | NA | 733 | 1000 | 1.0000000 | 12.118 | 2021-06-13 15:58:44 | 1 |

ranger | NA | NA | NA | NA | 1000 | 2000 | 1.0000000 | 18.799 | 2021-06-13 15:58:44 | 1 |

ranger | NA | NA | NA | NA | 200 | 2000 | 1.0000000 | 7.391 | 2021-06-13 16:00:07 | 2 |

ranger | NA | NA | NA | NA | 466 | 500 | 1.0000000 | 5.676 | 2021-06-13 16:00:07 | 2 |

ranger | NA | NA | NA | NA | 733 | 500 | 1.0000000 | 6.293 | 2021-06-13 16:00:07 | 2 |

ranger | NA | NA | NA | NA | 733 | 2000 | 1.0000000 | 13.095 | 2021-06-13 16:00:07 | 2 |

ranger | NA | NA | NA | NA | 1000 | 1500 | 1.0000000 | 13.081 | 2021-06-13 16:00:07 | 2 |

ranger | NA | NA | NA | NA | 200 | 1000 | 1.0000000 | 6.328 | 2021-06-13 16:01:32 | 3 |

ranger | NA | NA | NA | NA | 733 | 1500 | 1.0000000 | 11.706 | 2021-06-13 16:01:32 | 3 |

ranger | NA | NA | NA | NA | 1000 | 1000 | 1.0000000 | 10.622 | 2021-06-13 16:01:32 | 3 |

We achieve very good accuracy using `ranger`

, more or less regardless how `mtry`

and `num.trees`

are set. However, the `LDA`

also shows very good accuracy when combined with `PCA`

or `ICA`

retaining 30 components.

For now, we decide to use `ranger`

with `mtry`

set to 200 and `num.trees`

set to 1000.

Setting these parameters manually in our graph, then training on the training task and predicting on the test task yields an accuracy of:

```
graph1$param_set$values$branch_learner.selection = "ranger"
graph1$param_set$values$classif.ranger.mtry = 200
graph1$param_set$values$classif.ranger.num.trees = 1000
graph1$train(task_train)
```

```
$unbranch_learner.output
NULL
```

```
graph1$predict(task_test)[[1L]]$score(msr("classif.acc"))
```

```
classif.acc
1
```

Note that we also could have wrapped our graph in a `GraphLearner`

and proceeded to use this as a learner in an `AutoTuner`

.

Instead of using branches to split our graph with respect to the learner and preprocessing options, we can also use `PipeOpProxy`

. `PipeOpProxy`

accepts a single `content`

parameter that can contain any other `PipeOp`

or `Graph`

. This is extremely flexible in the sense that we do not have to specify our options during construction. However, the parameters of the contained `PipeOp`

or `Graph`

are no longer directly contained in the `ParamSet`

of the resulting graph. Therefore, when tuning the graph, we do have to make use of a `trafo`

function.

This graph now looks like the following:

```
graph2$plot()
```

At first, this may look like a linear graph. However, as the `content`

parameter of `PipeOpProxy`

can be tuned and set to contain any other `PipeOp`

or `Graph`

, this will allow for a similar non-linear graph as when doing branching.

```
graph2$param_set$ids()
```

```
[1] "colapply.applicator" "colapply.affect_columns" "scale.center" "scale.scale"
[5] "scale.robust" "scale.affect_columns" "proxy.content"
```

We can tune the graph by using the same search space as before. However, here the `trafo`

function is of central importance to actually set our options and parameters:

```
tune_ps2 = tune_ps1$clone(deep = TRUE)
```

The `trafo`

function does all the work, i.e., selecting either the `PCA`

/ `ICA`

-`LDA`

or `ranger`

as the `proxy.content`

as well as setting the parameters of the respective preprocessing `PipeOp`

s and `Learner`

s.

Above, we made use of the `branch`

`ppl`

allowing us to easily construct a branching graph. Of course we also could have use another nested `PipeOpProxy`

to specify the preprocessing options (`"pca"`

vs. `"ica"`

) within `proxy_options`

if for some reason we do not want to do branching at all. The `trafo`

function below selects one of the `proxy_options`

from above and sets the respective parameters for the `PCA`

, `ICA`

, `LDA`

and `ranger`

. Here, the argument `x`

is a list which will contain sampled / selected parameters from our `ParamSet`

(in our case, `tune_ps2`

). The return value is a list only including the appropriate `proxy.content`

parameter. In each tuning iteration, the `proxy.content`

parameter of our graph will be set to this value.

```
tune_ps2$trafo = function(x, param_set) {
proxy.content = proxy_options[[x$branch_learner.selection]]
if (x$branch_learner.selection == "pca_ica_lda") {
# pca_ica_lda
proxy.content$param_set$values$branch.selection = x$branch_preproc_lda.selection
if (x$branch_preproc_lda.selection == "pca") {
proxy.content$param_set$values$pca.rank. = x$pca.rank.
} else {
proxy.content$param_set$values$ica.n.comp = x$ica.n.comp
}
proxy.content$param_set$values$classif.lda.method = x$classif.lda.method
} else {
# ranger
proxy.content$param_set$values$mtry = x$classif.ranger.mtry
proxy.content$param_set$values$num.trees = x$classif.ranger.num.trees
}
list(proxy.content = proxy.content)
}
```

I.e., suppose that the following parameters will be selected from our `ParamSet`

:

```
x = list(
branch_learner.selection = "ranger",
classif.ranger.mtry = 200,
classif.ranger.num.trees = 500)
```

The `trafo`

function will then return:

```
tune_ps2$trafo(x)
```

```
$proxy.content
<LearnerClassifRanger:classif.ranger>
* Model: -
* Parameters: num.threads=1, mtry=200, num.trees=500
* Packages: ranger
* Predict Type: response
* Feature types: logical, integer, numeric, character, factor, ordered
* Properties: importance, multiclass, oob_error, twoclass, weights
```

Tuning can be carried out analogously as done above:

```
tune2 = TuningInstanceSingleCrit$new(
task_train,
learner = graph2,
resampling = rsmp("cv", folds = 3),
measure = msr("classif.acc"),
search_space = tune_ps2,
terminator = trm("none")
)
tuner_gs$optimize(tune2)
```

```
as.data.table(tune2$archive)[order(classif.acc), ]
```

Morais, Camilo LM, and Kássio MG Lima. 2018. “Principal Component Analysis with Linear and Quadratic Discriminant Analysis for Identification of Cancer Samples Based on Mass Spectrometry.” *Journal of the Brazilian Chemical Society* 29 (3): 472–81. https://doi.org/10.21577/0103-5053.20170159.

For attribution, please cite this work as

Schneider (2021, Feb. 3). mlr3gallery: Tuning a Complex Graph. Retrieved from https://mlr3gallery.mlr-org.com/posts/2021-02-03-tuning-a-complex-graph/

BibTeX citation

@misc{schneider2021tuning, author = {Schneider, Lennart}, title = {mlr3gallery: Tuning a Complex Graph}, url = {https://mlr3gallery.mlr-org.com/posts/2021-02-03-tuning-a-complex-graph/}, year = {2021} }