Testing Super Learner’s Coverage – A Note To Myself

Wait 5 sec.

[This article was first published on r on Everyday Is A School Day, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.Testing Super Learner with TMLE showed some interesting patterns XGBoost + random forest only hit ~54% coverage, but tuned xgboost + GLM reached ~90%. Seems like pairing flexible learners with stable (even misspecified) models helps? Need to explore this more with different setups Motivations: We have previously looked at TMLE without Super Learner and observed the poor coverage of TMLE with unspecified xgboost when compared to a correctly specified logistic regression. Now since we’ve learnt NNLS the default method behind Super Learner, why dont we test it out and take a look at the coverage! Let’s take a look at our previous coverage plot.We saw that with xgboost and TMLE, our coverage is about 71.1-87.6% and the lower and upper tails are asymmetrical. What if we use SuperLearner, can we improve these coverage?Objectives Let’s CodeWhat’s The result?Opportunities for improvementLessons LearntLet’s Code We will be using the same data generating mechanism as before, but this time we will be using Super Learner to estimate the propensity score and outcome regression.Let’s write a code where we can use multicore and also a cheap 2nd hand lenovo with quad-core and Ubuntu to run and store results. Part of my learning goal for 2026 is to experiment more with parallel computing and see if we can explore mathematical platonic space more using simulation! Stay tuned on that, more to come on the simple parallel computing cluster set up!codelibrary(dplyr)library(SuperLearner)library(future)library(future.apply)# Set up parallel processingplan(multicore, workers = 4)set.seed(1)n