Tuning a Complex Graph

mlr3tuning tuning optimization mlr3pipelines classification bst data set

We show how to tune a complex graph for a single task.

Lennart Schneider
02-03-2021

In this use case we show how to tune a rather complex graph consisting of different preprocessing steps and different learners where each preprocessing step and learner itself has parameters that can be tuned. You will learn the following:

Ideally you already had a look at how to tune over multiple learners.

First, we load the packages we will need:

Data and Task

We are going to work with some gene expression data included as a supplement in the bst package. The data consists of 2308 gene profiles in 63 training and 20 test samples. The following data preprocessing steps are done analogously as in vignette("khan", package = "bst"):

datafile = system.file("extdata", "supplemental_data", package = "bst")
dat0 = read.delim(datafile, header = TRUE, skip = 1)[, -(1:2)]
dat0 = t(dat0)
dat = data.frame(dat0[!(rownames(dat0) %in%
  c("TEST.9", "TEST.13", "TEST.5", "TEST.3", "TEST.11")), ])
dat$class = as.factor(
  c(substr(rownames(dat)[1:63], start = 1, stop = 2),
    c("NB", "RM", "NB", "EW", "RM", "BL", "EW", "RM", "EW", "EW", "EW", "RM",
      "BL", "RM", "NB", "NB", "NB", "NB", "BL", "EW")
  )
)

We then construct our training and test Task :

task = TaskClassif$new("SRBCT", backend = dat, target = "class")
task_train = task$clone(deep = TRUE)
task_train$filter(1:63)
task_test = task$clone(deep = TRUE)
task_test$filter(64:83)

Workflow

Our graph will start with log transforming the features, followed by scaling them. Then, either a PCA or ICA is applied to extract principal / independent components followed by fitting a LDA or a ranger random forest is fitted without any preprocessing (the log transformation and scaling should most likely affect the LDA more than the ranger random forest). Regarding the PCA and ICA, both the number of principal / independent components are tuning parameters. Regarding the LDA, we can further choose different methods for estimating the mean and variance and regarding the ranger, we want to tune the mtry and num.tree parameters. Note that the PCA-LDA combination has already been successfully applied in different cancer diagnostic contexts when the feature space is of high dimensionality (Morais and Lima 2018).

To allow for switching between the PCA / ICA-LDA and ranger we can either use branching or proxy pipelines, i.e., PipeOpBranch and PipeOpUnbranch or PipeOpProxy. We will first cover branching in detail and later show how the same can be done using PipeOpProxy.

Baseline

First, we have a look at the baseline classification accuracy of the LDA and ranger on the training task:

set.seed(1290)
base = benchmark(benchmark_grid(task_train,
  learners = list(lrn("classif.lda"), lrn("classif.ranger")),
  resamplings = rsmp("cv", folds = 3)))
base$aggregate(measures = msr("classif.acc"))
   nr      resample_result task_id     learner_id resampling_id iters classif.acc
1:  1 <ResampleResult[21]>   SRBCT    classif.lda            cv     3   0.6190476
2:  2 <ResampleResult[21]>   SRBCT classif.ranger            cv     3   0.9682540

The out-of-the-box ranger appears to already have good performance on the training task. Regarding the LDA, we do get a warning message that some features are colinear. This strongly suggests to reduce the dimensionality of the feature space. Let’s see if we can get some better performance, at least for the LDA.

Branching

Our graph starts with log transforming the features (we explicitly use base 10 only for better interpretability when inspecting the model later), using PipeOpColApply, followed by scaling the features using PipeOpScale. Then, the first branch allows for switching between the PCA / ICA-LDA and ranger, and within PCA / ICA-LDA, the second branch allows for switching between PCA and ICA:

graph1 =
  po("colapply", applicator = function(x) log(x, base = 10)) %>>%
  po("scale") %>>%
  # pca / ica followed by lda vs. ranger
  po("branch", id = "branch_learner", options = c("pca_ica_lda", "ranger")) %>>%
  gunion(list(
    po("branch", id = "branch_preproc_lda", options = c("pca", "ica")) %>>%
      gunion(list(
        po("pca"), po("ica")
      )) %>>%
      po("unbranch", id = "unbranch_preproc_lda") %>>%
      lrn("classif.lda"),
    lrn("classif.ranger")
  )) %>>%
  po("unbranch", id = "unbranch_learner")

Note that the names of the options within each branch are arbitrary, but ideally they describe what is happening. Therefore we go with "pca_ica_lda" / "ranger" and "pca" / "ica". Finally, we also could have used the branch ppl to make branching easier (we will come back to this in the Proxy section). The graph looks like the following:

graph1$plot()

We can inspect the parameters of the ParamSet of the graph to see which parameters can be set:

graph1$param_set$ids()
 [1] "colapply.applicator"                         "colapply.affect_columns"                    
 [3] "scale.center"                                "scale.scale"                                
 [5] "scale.robust"                                "scale.affect_columns"                       
 [7] "branch_learner.selection"                    "branch_preproc_lda.selection"               
 [9] "pca.center"                                  "pca.scale."                                 
[11] "pca.rank."                                   "pca.affect_columns"                         
[13] "ica.n.comp"                                  "ica.alg.typ"                                
[15] "ica.fun"                                     "ica.alpha"                                  
[17] "ica.method"                                  "ica.row.norm"                               
[19] "ica.maxit"                                   "ica.tol"                                    
[21] "ica.verbose"                                 "ica.w.init"                                 
[23] "ica.affect_columns"                          "classif.lda.prior"                          
[25] "classif.lda.tol"                             "classif.lda.method"                         
[27] "classif.lda.nu"                              "classif.lda.predict.method"                 
[29] "classif.lda.predict.prior"                   "classif.lda.dimen"                          
[31] "classif.ranger.num.trees"                    "classif.ranger.mtry"                        
[33] "classif.ranger.importance"                   "classif.ranger.write.forest"                
[35] "classif.ranger.min.node.size"                "classif.ranger.replace"                     
[37] "classif.ranger.sample.fraction"              "classif.ranger.class.weights"               
[39] "classif.ranger.splitrule"                    "classif.ranger.num.random.splits"           
[41] "classif.ranger.split.select.weights"         "classif.ranger.always.split.variables"      
[43] "classif.ranger.respect.unordered.factors"    "classif.ranger.scale.permutation.importance"
[45] "classif.ranger.keep.inbag"                   "classif.ranger.holdout"                     
[47] "classif.ranger.num.threads"                  "classif.ranger.save.memory"                 
[49] "classif.ranger.verbose"                      "classif.ranger.oob.error"                   
[51] "classif.ranger.max.depth"                    "classif.ranger.alpha"                       
[53] "classif.ranger.min.prop"                     "classif.ranger.regularization.factor"       
[55] "classif.ranger.regularization.usedepth"      "classif.ranger.seed"                        
[57] "classif.ranger.minprop"                      "classif.ranger.se.method"                   

The id’s are prefixed by the respective PipeOp they belong to, e.g., pca.rank. refers to the rank. parameter of PipeOpPCA.

Search Space

Our graph either fits a LDA after applying PCA or ICA, or alternatively a ranger with no preprocessing. These two options each define selection parameters that we can tune. Moreover, within the respective PipeOp’s we want to tune the following parameters: pca.rank., ica.n.comp, classif.lda.method, classif.ranger.mtry, and classif.ranger.num.trees. The first two parameters are integers that in-principal could range from 1 to the number of features. However, for ICA, the upper bound must not exceed the number of observations and as we will later use 3-fold cross-validation as the resampling method for the tuning, we just set the upper bound to 30 (and do the same for PCA). Regarding the classif.lda.method we will only be interested in "moment" estimation vs. minimum volume ellipsoid covariance estimation ("mve"). Moreover, we set the lower bound of classif.ranger.mtry to 200 (which is around the number of features divided by 10) and the upper bound to 1000.

tune_ps1 = ParamSet$new(list(
  ParamFct$new("branch_learner.selection", levels = c("pca_ica_lda", "ranger")),
  ParamFct$new("branch_preproc_lda.selection", levels = c("pca", "ica")),
  ParamInt$new("pca.rank.", lower = 1, upper = 30),
  ParamInt$new("ica.n.comp", lower = 1, upper = 30),
  ParamFct$new("classif.lda.method", levels = c("moment", "mve")),
  ParamInt$new("classif.ranger.mtry", lower = 200, upper = 1000),
  ParamInt$new("classif.ranger.num.trees", lower = 500, upper = 2000))
)

The parameter branch_learner.selection defines whether we go down the left (PCA / ICA followed by LDA) or the right branch (ranger). The parameter branch_preproc_lda.selection defines whether a PCA or ICA will be applied prior to the LDA. The other parameters directly belong to the ParamSet of the PCA / ICA / LDA / ranger. Note that it only makes sense to switch between PCA / ICA if the "pca_ica_lda" branch was selected beforehand. We have to specify this via:

tune_ps1$add_dep("branch_preproc_lda.selection",
  on = "branch_learner.selection",
  cond = CondEqual$new("pca_ica_lda"))

Again, regarding the pca.rank. parameter, there is a dependency on that "pca" must have been selected in branch_preproc_lda.selection beforehand, which we have to explicitly specify:

tune_ps1$add_dep("pca.rank.",
  on = "branch_preproc_lda.selection",
  cond = CondEqual$new("pca"))

The same holds for the following parameters analogously:

tune_ps1$add_dep("ica.n.comp",
  on = "branch_preproc_lda.selection",
  cond = CondEqual$new("ica"))
tune_ps1$add_dep("classif.lda.method",
  on = "branch_preproc_lda.selection",
  cond = CondEqual$new("ica"))
tune_ps1$add_dep("classif.ranger.mtry",
  on = "branch_learner.selection",
  cond = CondEqual$new("ranger"))
tune_ps1$add_dep("classif.ranger.num.trees",
  on = "branch_learner.selection",
  cond = CondEqual$new("ranger"))

Finally, we also could have proceeded to tune the numeric parameters on a log scale. I.e., looking at pca.rank. the performance difference between rank 1 and 2 is probably much larger than between rank 29 and rank 30. The mlr3tuning Tutorial covers such transformations.

Tuning

We can now tune the parameters of our graph as defined in the search space with respect to a measure. We will use the classification accuracy. As a resampling method we use 3-fold cross-validation. We will use the TerminatorNone (i.e., no early termination) for terminating the tuning because we will apply a grid search (we use a grid search because it gives nicely plottable and understandable results but if there were much more parameters, random search or more intelligent optimization methods would be preferred to a grid search:

set.seed(2409)
tune1 = TuningInstanceSingleCrit$new(
  task_train,
  learner = graph1,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.acc"),
  search_space = tune_ps1,
  terminator = trm("none")
)

We then perform a grid search using a resolution of 4 for the numeric parameters. The grid being used will look like the following (note that the dependencies we specified above are handled automatically):

generate_design_grid(tune_ps1, resolution = 4)
branch_learner.selection branch_preproc_lda.selection pca.rank. ica.n.comp classif.lda.method classif.ranger.mtry classif.ranger.num.trees
pca_ica_lda pca 1 NA NA NA NA
pca_ica_lda pca 10 NA NA NA NA
pca_ica_lda pca 20 NA NA NA NA
pca_ica_lda pca 30 NA NA NA NA
pca_ica_lda ica NA 1 moment NA NA
pca_ica_lda ica NA 1 mve NA NA
pca_ica_lda ica NA 10 moment NA NA
pca_ica_lda ica NA 10 mve NA NA
pca_ica_lda ica NA 20 moment NA NA
pca_ica_lda ica NA 20 mve NA NA
pca_ica_lda ica NA 30 moment NA NA
pca_ica_lda ica NA 30 mve NA NA
ranger NA NA NA NA 200 500
ranger NA NA NA NA 200 1000
ranger NA NA NA NA 200 1500
ranger NA NA NA NA 200 2000
ranger NA NA NA NA 466 500
ranger NA NA NA NA 466 1000
ranger NA NA NA NA 466 1500
ranger NA NA NA NA 466 2000
ranger NA NA NA NA 733 500
ranger NA NA NA NA 733 1000
ranger NA NA NA NA 733 1500
ranger NA NA NA NA 733 2000
ranger NA NA NA NA 1000 500
ranger NA NA NA NA 1000 1000
ranger NA NA NA NA 1000 1500
ranger NA NA NA NA 1000 2000

Before starting the tuning we set some logging thresholds (i.e., only print warnings on the console):

lgr::get_logger("mlr3")$set_threshold("warn")
lgr::get_logger("bbotk")$set_threshold("warn")
tuner_gs = tnr("grid_search", resolution = 4, batch_size = 10)
tuner_gs$optimize(tune1)

Now, we can inspect the results ordered by the classification accuracy:

tune1_results = tune1$archive$data
tune1_results[order(classif.acc), ]
branch_learner.selection branch_preproc_lda.selection pca.rank. ica.n.comp classif.lda.method classif.ranger.mtry classif.ranger.num.trees classif.acc uhash x_domain timestamp batch_nr
pca_ica_lda pca 1 NA NA NA NA 0.2380952 91642aac-41db-4aa6-a937-3749904eae7f pca_ica_lda, pca , 1 2021-06-12 13:55:20 3
pca_ica_lda ica NA 1 moment NA NA 0.2380952 077d1cd9-9e69-4512-9d06-8c23e854d59f pca_ica_lda, ica , 1 , moment 2021-06-12 13:55:20 3
pca_ica_lda ica NA 1 mve NA NA 0.2698413 b561ed8b-b91c-4549-b412-16a9bda12b74 pca_ica_lda, ica , 1 , mve 2021-06-12 13:55:20 3
pca_ica_lda pca 10 NA NA NA NA 0.8730159 86b6fc9d-1a67-496b-bd90-8cdcabc1c1fd pca_ica_lda, pca , 10 2021-06-12 13:52:35 1
pca_ica_lda ica NA 10 moment NA NA 0.8730159 dc5de8ad-aecc-4383-8e77-22b77e147dba pca_ica_lda, ica , 10 , moment 2021-06-12 13:53:55 2
pca_ica_lda ica NA 10 mve NA NA 0.8730159 7a4aa629-d371-4510-9109-344d3625a237 pca_ica_lda, ica , 10 , mve 2021-06-12 13:53:55 2
pca_ica_lda ica NA 20 mve NA NA 0.9365079 c1669de9-d9d8-4305-8c2f-a978ce0d6684 pca_ica_lda, ica , 20 , mve 2021-06-12 13:55:20 3
pca_ica_lda ica NA 30 mve NA NA 0.9365079 79c2e9db-4588-41f5-9e97-87faefd56896 pca_ica_lda, ica , 30 , mve 2021-06-12 13:55:20 3
pca_ica_lda ica NA 20 moment NA NA 0.9682540 3f982a35-63f4-4957-aebb-558699f2d28b pca_ica_lda, ica , 20 , moment 2021-06-12 13:52:35 1
pca_ica_lda pca 20 NA NA NA NA 0.9682540 84123ac6-2251-4ea5-bb6f-5b45fdd3f0bb pca_ica_lda, pca , 20 2021-06-12 13:53:55 2
pca_ica_lda pca 30 NA NA NA NA 0.9841270 a434901f-20ee-466a-8159-ad4202171289 pca_ica_lda, pca , 30 2021-06-12 13:52:35 1
pca_ica_lda ica NA 30 moment NA NA 0.9841270 0bad930f-bb8d-440b-8ae4-cbd12eba53c2 pca_ica_lda, ica , 30 , moment 2021-06-12 13:53:55 2
ranger NA NA NA NA 1000 500 0.9841270 2e903d89-46f5-4da5-943a-62694bf542bc ranger, 1000 , 500 2021-06-12 13:55:20 3
ranger NA NA NA NA 200 500 1.0000000 d6c90463-3720-47c6-bdf3-e13a1db04621 ranger, 200 , 500 2021-06-12 13:52:35 1
ranger NA NA NA NA 200 1500 1.0000000 9d366f57-7f35-433b-9b57-2cd4af6f7640 ranger, 200 , 1500 2021-06-12 13:52:35 1
ranger NA NA NA NA 466 1000 1.0000000 35cf6d9b-b327-4eec-a56c-ebeb3e0a1cb4 ranger, 466 , 1000 2021-06-12 13:52:35 1
ranger NA NA NA NA 466 1500 1.0000000 88081c06-3b44-46bb-8af0-f21b2abf0c51 ranger, 466 , 1500 2021-06-12 13:52:35 1
ranger NA NA NA NA 466 2000 1.0000000 14d8f251-8e3f-4ad8-b58b-5c09c44c99e0 ranger, 466 , 2000 2021-06-12 13:52:35 1
ranger NA NA NA NA 733 1000 1.0000000 f6eb7873-4320-4d55-a816-83448b6f9935 ranger, 733 , 1000 2021-06-12 13:52:35 1
ranger NA NA NA NA 1000 2000 1.0000000 913a1e1c-883d-4c31-9c69-081181885c3b ranger, 1000 , 2000 2021-06-12 13:52:35 1
ranger NA NA NA NA 200 2000 1.0000000 b5b740f1-e483-4f0b-8be5-dda07561a7bf ranger, 200 , 2000 2021-06-12 13:53:55 2
ranger NA NA NA NA 466 500 1.0000000 eefbd0d1-9f4b-4bb8-9570-50d75b6c0f4d ranger, 466 , 500 2021-06-12 13:53:55 2
ranger NA NA NA NA 733 500 1.0000000 ee800410-f5e0-44d5-8f80-31056ebe6270 ranger, 733 , 500 2021-06-12 13:53:55 2
ranger NA NA NA NA 733 2000 1.0000000 794f3c10-d675-4042-b590-59a3c7e47b5b ranger, 733 , 2000 2021-06-12 13:53:55 2
ranger NA NA NA NA 1000 1500 1.0000000 862a0afa-862b-475b-94ad-f7f89fe23e57 ranger, 1000 , 1500 2021-06-12 13:53:55 2
ranger NA NA NA NA 200 1000 1.0000000 f0dbf08b-7ed4-4d41-bf62-e2f010fc2f3e ranger, 200 , 1000 2021-06-12 13:55:20 3
ranger NA NA NA NA 733 1500 1.0000000 f3f68af3-2aa6-42c4-bb57-988e303d8f34 ranger, 733 , 1500 2021-06-12 13:55:20 3
ranger NA NA NA NA 1000 1000 1.0000000 25069eb5-af8b-4165-99c2-164254c5c56e ranger, 1000 , 1000 2021-06-12 13:55:20 3

We achieve very good accuracy using ranger, more or less regardless how mtry and num.trees are set. However, the LDA also shows very good accuracy when combined with PCA or ICA retaining 30 components.

For now, we decide to use ranger with mtry set to 200 and num.trees set to 1000.

Setting these parameters manually in our graph, then training on the training task and predicting on the test task yields an accuracy of:

graph1$param_set$values$branch_learner.selection = "ranger"
graph1$param_set$values$classif.ranger.mtry = 200
graph1$param_set$values$classif.ranger.num.trees = 1000
graph1$train(task_train)
$unbranch_learner.output
NULL
graph1$predict(task_test)[[1L]]$score(msr("classif.acc"))
classif.acc 
          1 

Note that we also could have wrapped our graph in a GraphLearner and proceeded to use this as a learner in an AutoTuner.

Proxy

Instead of using branches to split our graph with respect to the learner and preprocessing options, we can also use PipeOpProxy. PipeOpProxy accepts a single content parameter that can contain any other PipeOp or Graph. This is extremely flexible in the sense that we do not have to specify our options during construction. However, the parameters of the contained PipeOp or Graph are no longer directly contained in the ParamSet of the resulting graph. Therefore, when tuning the graph, we do have to make use of a trafo function.

graph2 =
  po("colapply", applicator = function(x) log(x, base = 10)) %>>%
  po("scale") %>>%
  po("proxy")

This graph now looks like the following:

graph2$plot()

At first, this may look like a linear graph. However, as the content parameter of PipeOpProxy can be tuned and set to contain any other PipeOp or Graph, this will allow for a similar non-linear graph as when doing branching.

graph2$param_set$ids()
[1] "colapply.applicator"     "colapply.affect_columns" "scale.center"            "scale.scale"            
[5] "scale.robust"            "scale.affect_columns"    "proxy.content"          

We can tune the graph by using the same search space as before. However, here the trafo function is of central importance to actually set our options and parameters:

tune_ps2 = tune_ps1$clone(deep = TRUE)

The trafo function does all the work, i.e., selecting either the PCA / ICA-LDA or ranger as the proxy.content as well as setting the parameters of the respective preprocessing PipeOps and Learners.

proxy_options = list(
  pca_ica_lda =
    ppl("branch", graphs = list(pca = po("pca"), ica = po("ica"))) %>>%
      lrn("classif.lda"),
  ranger = lrn("classif.ranger")
)

Above, we made use of the branch ppl allowing us to easily construct a branching graph. Of course we also could have use another nested PipeOpProxy to specify the preprocessing options ("pca" vs. "ica") within proxy_options if for some reason we do not want to do branching at all. The trafo function below selects one of the proxy_options from above and sets the respective parameters for the PCA, ICA, LDA and ranger. Here, the argument x is a list which will contain sampled / selected parameters from our ParamSet (in our case, tune_ps2). The return value is a list only including the appropriate proxy.content parameter. In each tuning iteration, the proxy.content parameter of our graph will be set to this value.

tune_ps2$trafo = function(x, param_set) {
  proxy.content = proxy_options[[x$branch_learner.selection]]
  if (x$branch_learner.selection == "pca_ica_lda") {
    # pca_ica_lda
    proxy.content$param_set$values$branch.selection = x$branch_preproc_lda.selection
    if (x$branch_preproc_lda.selection == "pca") {
      proxy.content$param_set$values$pca.rank. = x$pca.rank.
    } else {
      proxy.content$param_set$values$ica.n.comp = x$ica.n.comp
    }
    proxy.content$param_set$values$classif.lda.method = x$classif.lda.method
  } else {
    # ranger
    proxy.content$param_set$values$mtry = x$classif.ranger.mtry
    proxy.content$param_set$values$num.trees = x$classif.ranger.num.trees
  }
  list(proxy.content = proxy.content)
}

I.e., suppose that the following parameters will be selected from our ParamSet:

x = list(
  branch_learner.selection = "ranger",
  classif.ranger.mtry = 200,
  classif.ranger.num.trees = 500)

The trafo function will then return:

tune_ps2$trafo(x)
$proxy.content
<LearnerClassifRanger:classif.ranger>
* Model: -
* Parameters: num.threads=1, mtry=200, num.trees=500
* Packages: ranger
* Predict Type: response
* Feature types: logical, integer, numeric, character, factor, ordered
* Properties: importance, multiclass, oob_error, twoclass, weights

Tuning can be carried out analogously as done above:

set.seed(2409)
tune2 = TuningInstanceSingleCrit$new(
  task_train,
  learner = graph2,
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.acc"),
  search_space = tune_ps2,
  terminator = trm("none")
)
tuner_gs$optimize(tune2)
tune2_results = tune2$archive$data
tune2_results[order(classif.acc), ]

Morais, Camilo LM, and Kássio MG Lima. 2018. “Principal Component Analysis with Linear and Quadratic Discriminant Analysis for Identification of Cancer Samples Based on Mass Spectrometry.” Journal of the Brazilian Chemical Society 29 (3): 472–81. https://doi.org/10.21577/0103-5053.20170159.

References

Citation

For attribution, please cite this work as

Schneider (2021, Feb. 3). mlr3gallery: Tuning a Complex Graph. Retrieved from https://mlr3gallery.mlr-org.com/posts/2021-02-03-tuning-a-complex-graph/

BibTeX citation

@misc{schneider2021tuning,
  author = {Schneider, Lennart},
  title = {mlr3gallery: Tuning a Complex Graph},
  url = {https://mlr3gallery.mlr-org.com/posts/2021-02-03-tuning-a-complex-graph/},
  year = {2021}
}