Choose Your Fighter: data-driven selection of the best marathon

Wait 5 sec.

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.Running a marathon is a big deal. It takes a lot of time to train to run a good time, and it takes a while to recover. So, if you’re chasing a marathon PB (personal best) time, you need to choose which Marathon to target wisely. How can we use data to help our decision? Let’s use R to find out!For the impatient: just show me the marathon data! or I want to see how to code this up!Let’s leave aside the fact that for the most popular marathons, it might not be your choice whether you can register. What factors do we need to consider to pick the best one?Flat courseFavourable weatherTravel considerationsThe flattest course is ideal. Any elevation gain will slow us down. We could look at finish times to know how “fast” the course is in practice, however the finish times really depend on who is running, and how many participants there are. It’s slightly cyclical with the bigger marathons attracting faster runners. To keep things simple, I didn’t use timings and went solely with elevation data.Most people would agree that running in cool temperatures, ideally dry, is best. This is why they tend to be organised in the spring and autumn. So, we need to have an idea of the likely conditions on the day.The ideal marathon would also be easy to get to. Since I am based in the UK, I made a list of popular marathons in the UK and then added the World Marathons for comparison, as well as a few others from Europe that people I know have run. For each one, I grabbed a GPX file of the route from Garmin (more on this below), and made a note of what date the last 3 editions occurred (for the weather data). Using these things, and making use of a few R libraries, I could generate graphics to compare the marathon routes.The marathon dataClick on each image to enlarge it:The course profile for each race is shown on the same scale to give feel for how challenging it is. Here is the key data organised into a table, listed by date.MarathonDateElevation gain (m)Typical max temp (°C)Tokyo1/3/2615014.8Great Welsh8/3/2611811.4Cambridge Boundary15/3/2615810.4Boston Lincs.12/4/264613.4Brighton12/4/2616013.1Paris12/4/2619414.6Manchester19/4/2612114.2Newport19/4/267713.6Boston20/4/2623417.9Blackpool26/4/2617213.1London26/4/2616214.6Stratford-upon-Avon26/4/2619514.2Milton Keynes4/5/2620514.8Leeds Rob Burrow10/5/2640019.4Worcester17/5/2629619.6Edinburgh24/5/2611314.4Sydney30/8/2636921.8Berlin27/9/2610119.7Chester4/10/2621317Chicago11/10/2610516.2Abingdon18/10/269715.6Yorkshire18/10/2614814.1Amsterdam18/10/2617413.8Frankfurt25/10/2614213New York1/11/2617915.2Valencia6/12/2614417.2and here’s a graphical look at the same data:BreakdownLet’s face it, most marathons market themselves as flat and fast. Which ones can really make that claimThe three flattest on our list are Boston (Lincs.), Newport and Abingdon. The following marathons are all less than 150 m gain and therefore pretty flat: Great Welsh, Manchester, Edinburgh, Berlin, Chicago, Yorkshire, Frankfurt, Valencia. Between 150-200 m, which is still fairly flat, we have Tokyo, Cambridge Boundary, Brighton, Paris, Blackpool, London, Stratford-upon-Avon, Amsterdam and New York. Beyond this, we are into rolling territory. Marathons with more than 200 m of elevation gain are Boston, Milton Keynes, Leeds, Worcester, Sydney and Chester.Of the flattest marathons on our list, the coolest temperatures are likely to be at Great Welsh, Frankfurt, Boston (Lincolnshire) and Newport. Whereas Berlin, Valencia and Chicago are probably the warmest. So, this gives us an idea of where the best performances can be unlocked.Data accuracyGetting the total elevation gain is difficult. I used a single data source (Garmin Connect) for the GPS data to reduce variation but even on this single source, the total gain calculated varied a lot.The elevation data for a GPS location obviously needs to be correct. This is not necessarily true if the data is taken from a watch, where the barometer could be inaccurate or where tall buildings interfere with the location (which is a problem for city marathons).If the data is correct then the calculation can still be inaccurate due to sampling frequency. If we add all the elevation gains for a track sampled every 10 metres, versus one sampled every 50 metres, we will get a different answer because the latter is smoother than the former. To deal with this, I resampled the elevation data on a uniform distance scale to get the most accurate elevation gain I could from the data I had. This caveat will be the case for whatever marathon data you will find online. So our comparison here allows us to say that one marathon has more or less elevation gain than another, but it doesn’t allow us to compare elevation gain with data on another site.The weather “forecast” is taken by looking at the weather at the last three editions – with the exception of Valencia where the 2025 edition has not yet happened. I used the average of the max temperature on those editions. A more accurate picture would be to take a several days either side of the event because it could be that the weather on one or more of the editions was rather atypical.Finally, I manually collated the data, so errors are possible. Apologies for any mistakes!The codeIf you came here for the R coding rather than the running, here is the bit where I show how the analysis works! Besides general R stuff – importing data, calculations, making plots – we need to do a few other things:read the GPX data and calculate the elevation data – we’ll use {gpxtoolbox} to help with thisretrieve weather data – we’ll use {openmeteo} for thisconvert WMO codes into icons, load the icons and display themWe have two functions saved to a script that gets sourced during the main script. It’s purpose is to convert the WMO codes into icons. I found a gist that had the WMO codes and the corresponding URLs of the day or night versions of the icons. The first function converts this data (in json format) into a data frame that we can use in the main script. The second function converts the wind direction into a text arrow for display.library(jsonlite)library(dplyr)library(tidyr)library(purrr)library(stringr)library(tibble)# Example: read the JSON into `lst`# lst