# Comparison of Decision Boundaries of Classification Learners

Visuzalizes the decision boundaries of multiple classification learners on some artificial data sets.

Michel Lang
08-14-2020

The visualization of decision boundaries helps to understand what the pros and cons of individual classification learners are. This posts demonstrates how to create such plots.

## Artificial Data Sets

The three artificial data sets are generated by task generators (implemented in mlr3):

library("mlr3")

N = 200
tgen("xor")$generate(N), tgen("moons")$generate(N),
tgen("circle")$generate(N) )  ### XOR Points are distributed on a 2-dimensional cube with corners $$(\pm 1, \pm 1)$$. Class is "red" if $$x$$ and $$y$$ have the same sign, and "black" otherwise. plot(tgen("xor")) ### Circle Two circles with same center but different radii. Points in the smaller circle are "black", points only in the larger circle are "red". plot(tgen("circle")) ### Moons Two interleaving half circles (“moons”). plot(tgen("moons")) ## Learners We consider the following learners: library("mlr3learners") learners = list( # k-nearest neighbours classifier lrn("classif.kknn", id = "kkn", predict_type = "prob", k = 3), # linear svm lrn("classif.svm", id = "lin. svm", predict_type = "prob", kernel = "linear"), # radial-basis function svm lrn("classif.svm", id = "rbf svm", predict_type = "prob", kernel = "radial", gamma = 2, cost = 1, type = "C-classification"), # naive bayes lrn("classif.naive_bayes", id = "naive bayes", predict_type = "prob"), # single decision tree lrn("classif.rpart", id = "tree", predict_type = "prob", cp = 0, maxdepth = 5), # random forest lrn("classif.ranger", id = "random forest", predict_type = "prob") )  The hyperparameters are chosen in a way that the decision boundaries look “typical” for the respective classifier. Of course, with different hyperparameters, results may look very different. ## Fitting the Models To apply each learner on each task, we first build an exhaustive grid design of experiments with benchmark_grid() and then pass it to benchmark() to do the actual work. A simple holdout resampling is used here: design = benchmark_grid( tasks = tasks, learners = learners, resamplings = rsmp("holdout") ) set.seed(123) bmr = benchmark(design, store_models = TRUE)  A quick look into the performance values: perf = bmr$aggregate(msr("classif.acc"))[, c("task_id", "learner_id", "classif.acc")]
knitr::kable(perf)

xor_200 kkn 0.9104478
xor_200 lin. svm 0.4925373
xor_200 rbf svm 0.9104478
xor_200 naive bayes 0.3880597
xor_200 tree 0.8805970
xor_200 random forest 0.9552239
moons_200 kkn 0.9850746
moons_200 lin. svm 0.9104478
moons_200 rbf svm 0.9850746
moons_200 naive bayes 0.9104478
moons_200 tree 0.9104478
moons_200 random forest 0.9701493
circle_200 kkn 0.8955224
circle_200 lin. svm 0.5074627
circle_200 rbf svm 0.9104478
circle_200 naive bayes 0.7910448
circle_200 tree 0.8656716
circle_200 random forest 0.8507463

## Plotting

To generate the plots, we iterate over the individual ResampleResult objects stored in the BenchmarkResult, and in each iteration we store the plot of the learner prediction generated by the mlr3viz package.

library("mlr3viz")

n = bmr$n_resample_results plots = vector("list", n) for (i in seq_len(n)) { rr = bmr$resample_result(i)
plots[[i]] = autoplot(rr, type = "prediction")
}


We now have a list of plots. Each one can be printed individually:

print(plots[]) Note that only observations from the test data is plotted as points.

To get a nice annotated overview, we arranged all plots together in a single PDF file. The number in the upper right is the respective accuracy on the test set.

As you can see, the decision boundaries look very different. Some are linear, others are parallel to the axis, and yet others are highly non-linear. The boundaries are partly very smooth with a slow transition of probabilities, others are very abrupt. All these properties are important during model selection, and should be considered for your problem at hand.

### Citation

Lang (2020, Aug. 14). mlr3gallery: Comparison of Decision Boundaries of Classification Learners. Retrieved from https://mlr3gallery.mlr-org.com/posts/2020-08-14-comparison-of-decision-boundaries/
@misc{lang2020comparison,
}