Tuning a stacked learner

This tutorial explains how to create and tune a multilevel stacking model using the mlr3pipelines package.

Milan Dragicevic , Giuseppe Casalicchio
04-27-2020

Intro

Multilevel stacking is an ensemble technique, where predictions of several learners are added as new features to extend the orginal data on different levels. On each level, the extended data is used to train a new level of learners. This can be repeated for several iterations until a final learner is trained. To avoid overfitting, it is advisable to use test set (out-of-bag) predictions in each level.

In this post, a multilevel stacking example will be created using mlr3pipelines and tuned using mlr3tuning. A similar example is available in the mlr3book. However, we additionally explain how to tune the hyperparameters of the whole ensemble and each underlying learner jointly.

In our stacking example, we proceed as follows:

  1. Level 0: Based on the input data, we train three learners (rpart, glmnet and lda) on a sparser feature space obtained using different feature filter methods from mlr3filters to obtain slightly decorrelated predictions. The test set predictions of these learners are attached to the original data (used in level 0) and will serve as input for the learners in level 1.
  2. Level 1: We transform this extended data using PCA, on which we then train additional three learners (rpart, glmnet and lda). The test set predictions of the level 1 learners are attached to input data used in level 1.
  3. Finally, we train a final ranger learner to the data extended by level 1. Note that the number of features selected by the feature filter method in level 0 and the number of principal components retained in level 1 will be jointly tuned with some other hyperparameters of the learners in each level.

Prerequisites


library(mlr3)
library(mlr3learners)
library(mlr3pipelines)
library(mlr3filters)
library(mlr3tuning)
library(paradox)
library(glmnet)

For the stacking example, we use the sonar classification task:


sonar_task = tsk("sonar")
sonar_task$col_roles$stratum = sonar_task$target_names #stratification

Pipeline creation

Level 0

As mentioned, the level 0 learners are rpart, glmnet and lda:


rprt_lrn  = lrn("classif.rpart", predict_type = "prob")
glmnet_lrn =  lrn("classif.glmnet", predict_type = "prob")
lda_lrn = lrn("classif.lda", predict_type = "prob")

To create the learner out-of-bag predictions, we use PipeOpLearnerCV():


rprt_cv1 = po("learner_cv", rprt_lrn, id = "rprt_1")
glmnet_cv1 = po("learner_cv", glmnet_lrn, id = "glmnet_1")
lda_cv1 = po("learner_cv", lda_lrn, id = "lda_1")

A sparser representation of the input data in level 0 is obtained using the following filters:


anova = po("filter", flt("anova"), id = "filt1")
mrmr = po("filter", flt("mrmr"), id = "filt2")
find_cor = po("filter", flt("find_correlation"), id = "filt3")

To summarize these steps into level 0, we use the gunion function. The out-of-bag predictions of all level 0 learners is attached using PipeOpFeatureUnion along with the original data passed via PipeOpNOP():


level0 = gunion(list(
  anova %>>% rprt_cv1,
  mrmr %>>% glmnet_cv1,
  find_cor %>>% lda_cv1,
  po("nop", id = "nop1")))  %>>%
  po("featureunion", id = "union1")

We can have a look at the graph from level 0:


level0$plot(html = FALSE)

Level 1

Now, we create the level 1 learners:


rprt_cv2 = po("learner_cv", rprt_lrn , id = "rprt_2")
glmnet_cv2 = po("learner_cv", glmnet_lrn, id = "glmnet_2")
lda_cv2 = po("learner_cv", lda_lrn, id = "lda_2")

All level 1 learners will use PCA transformed data as input:


level1 = level0 %>>%
  po("copy", 4) %>>%
  gunion(list(
    po("pca", id = "pca2_1", param_vals = list(scale. = TRUE)) %>>% rprt_cv2,
    po("pca", id = "pca2_2", param_vals = list(scale. = TRUE)) %>>% glmnet_cv2,
    po("pca", id = "pca2_3", param_vals = list(scale. = TRUE)) %>>% lda_cv2,
    po("nop", id = "nop2"))
  )  %>>%
  po("featureunion", id = "union2")

We can have a look at the graph from level 1:


level1$plot(html = FALSE)

The out-of-bag predictions of the level 1 learners are attached to the input data from level 1 and a final ranger learner will be trained:


ranger_lrn = lrn("classif.ranger", predict_type = "prob")

ensemble = level1 %>>% ranger_lrn
ensemble$plot(html = FALSE)

Defining the tuning space

In order to tune the ensemble’s hyperparameter jointly, we define the search space using ParamSet from the paradox package:


ps_ens = ParamSet$new(
  list(
    ParamInt$new("filt1.filter.nfeat", 5, 50),
    ParamInt$new("filt2.filter.nfeat", 5, 50),
    ParamInt$new("filt3.filter.nfeat", 5, 50),
    ParamInt$new("pca2_1.rank.", 3, 50),
    ParamInt$new("pca2_2.rank.", 3, 50),
    ParamInt$new("pca2_3.rank.", 3, 20),
    ParamDbl$new("rprt_1.cp", 0.001, 0.1),
    ParamInt$new("rprt_1.minbucket", 1, 10),
    ParamDbl$new("glmnet_1.alpha", 0, 1),
    ParamDbl$new("rprt_2.cp", 0.001, 0.1),
    ParamInt$new("rprt_2.minbucket", 1, 10),
    ParamDbl$new("glmnet_2.alpha", 0, 1),
    ParamInt$new("classif.ranger.mtry", lower = 1L, upper = 10L),
    ParamDbl$new("classif.ranger.sample.fraction", lower = 0.5, upper = 1),
    ParamInt$new("classif.ranger.num.trees", lower = 50L, upper = 200L)
  ))

Performance comparison

Even with a simple ensemble, there is quite a few things to setup. We compare the performance of the ensemble with a simple tuned ranger learner.

To proceed, we convert the ensemble pipeline as a GraphLearner:


ens_lrn = GraphLearner$new(ensemble)
ens_lrn$predict_type = "prob"

We define the search space for the simple ranger learner:


ps_ranger = ParamSet$new(
  list(
    ParamInt$new("mtry", lower = 1L, upper = 10L),
    ParamDbl$new("sample.fraction", lower = 0.5, upper = 1),
    ParamInt$new("num.trees", lower = 50L, upper = 200L)
  ))

For performance comparison, we use the benchmark function that requires a design incorporating a list of learners and a list of tasks. Here, we have two learners (the simple ranger learner and the ensemble) and one task. Since we want to tune the simple ranger learner as well as the whole ensemble learner, we need to create an AutoTuner for each learner to be compared. To do so, we need to define a resampling strategy for the tuning in the inner loop (we use cv3) and for the final evaluation use use outer_hold:


cv3 = rsmp("cv", folds = 3)

# AutoTuner for the ensemble learner
auto1 = AutoTuner$new(
    learner = ens_lrn,
    resampling = cv3,
    measure = msr("classif.auc"),
    search_space = ps_ens,
    terminator = trm("evals", n_evals = 3), # to limit running time
    tuner = tnr("random_search")
  )

# AutoTuner for the simple ranger learner
auto2 = AutoTuner$new(
    learner = ranger_lrn,
    resampling = cv3,
    measure = msr("classif.auc"),
    search_space = ps_ranger,
    terminator = trm("evals", n_evals = 3), # to limit running time
    tuner = tnr("random_search")
  )

# Define the list of learners
learns = list(auto1, auto2)

# For benchmarking, we use a simple holdout
set.seed(321)
outer_hold = rsmp("holdout")
outer_hold$instantiate(sonar_task)

design = benchmark_grid(
  tasks = sonar_task,
  learners = learns,
  resamplings = outer_hold
)

bmr = benchmark(design, store_models = TRUE)
bmr$aggregate(msr("classif.auc"))

   nr      resample_result task_id
1:  1 <ResampleResult[18]>   sonar
2:  2 <ResampleResult[18]>   sonar
                                                                                                                             learner_id
1: filt1.filt2.filt3.nop1.rprt_1.glmnet_1.lda_1.union1.copy.pca2_1.pca2_2.pca2_3.nop2.rprt_2.glmnet_2.lda_2.union2.classif.ranger.tuned
2:                                                                                                                 classif.ranger.tuned
   resampling_id iters classif.auc
1:       holdout     1   0.9054054
2:       holdout     1   0.8969595

For a more reliable comparison, the number of evaluation of the random search should be increased.

Conclusion

This example shows the versatility of mlr3pipelines. By using more learners, varied representations of the data set as well as more levels, a powerful yet compute hungry pipeline can be created. It is important to note that care should be taken to avoid name clashes of pipeline objects.

Citation

For attribution, please cite this work as

Dragicevic & Casalicchio (2020, April 27). mlr3gallery: Tuning a stacked learner. Retrieved from https://mlr3gallery.mlr-org.com/posts/2020-04-27-tuning-stacking/

BibTeX citation

@misc{dragicevic2020tuning,
  author = {Dragicevic, Milan and Casalicchio, Giuseppe},
  title = {mlr3gallery: Tuning a stacked learner},
  url = {https://mlr3gallery.mlr-org.com/posts/2020-04-27-tuning-stacking/},
  year = {2020}
}