Marathon Man: how to pace a marathon

Wait 5 sec.

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.How does the average marathoner pace their race? In this post, we’ll use R to have a look at a large dataset of marathon times to try to answer this question.The ideal strategy would be to “even split” the race. This is where you run continually at the same pace from kilometre 0 to the finish. Let’s forget about “negative splitting”. This is where you speed up through the race, usually by running at a constant pace for the first half or three-quarters and then increasing the pace. Negative splits are for the pros not mere mortals! The difficulty with even-splitting the race is that it is very hard to know what pace you can maintain. The marathon gets hard for everyone after 30 km, so a slow down is almost inevitable. Certainly if you have started too fast you will fade. This situation is known as “positive splitting”.Why is it so hard to know what pace you can maintain? Well, you can predict a pace based on existing races e.g. half marathon, and there are various ways to do this, but it is difficult to tell if you can hold that pace for the marathon. It’s such a brutal event that training up to run one takes time and it equally takes a while to recover, so experimentation is limited. Running a full marathon (at pace) in training, is not advised. So determining an ideal pace involves quite a bit of guesswork.Let’s take a look at a big dataset of marathon times – we’ll use the New York City Marathon from 2025 – to see if we can understand how to pace a marathon. There’s an available dataset of chip times (meaning we don’t have to worry about dodgy GPS data) and the course has similar first and second half profiles, allowing us to use these times to understand negative/even/positive splitting. Let’s dive in.You can skip to the code to play along or just see the analysis here.First we can see using histograms of the difference between second half and first half of the marathon, that most runners positive split the marathon. There are very few runners who run a negative-split (blue bars, left of the dashed line). More runners even-split (yellow), but the majority run positive (red) split times.For marathoners with finishing in times of below 3 h, the modal split is only +2 minutes. Over 21.1 km this is only a loss of 6 s per km. For marathoners with finishes of over three hours, this loss gets more severe. Those finishing outside of 5 h, ship 20 minutes or more in the second half.At first glance this looks like better pace management by the faster runners, but these positive splits could be proportional to the paces being run. In other words, a slower runner should ship more time in the second half, because they’re running more slowly.We can look at this data a different way and directly compare the first and second half times for each runner. Again this highlights just how few runners negative- or even-split the marathon. Most are positive splitting and are in the upper left half of the plot. We can also see that the data veers away from the ideal even-split (dashed line) with the slower paces. This veering looks linear (straight line).We can fit a line to this data, and constrain it to go through (1,1) i.e. a 2 h marathoner even-splitting the race. To do this in R we can use lm(formula = I(y - 60) ~ I(x - 60) + 0, data = fitting) and this gives the coefficient for I(x – 60) as 1.24. This is essentially the fade co-efficient for the average runner in the 2025 edition of this race.What does that mean? Well, for a runner achieving a 90 minute first half, their second half would most likely be: 60 + 1.239 * (90 – 60) = 97.17 minutes, so this would be a finish time of 3:07:10.For anyone looking to run a 3 h New York Marathon, the average runner would therefore need to run 60 / 2.239 + 60 = 86.8 minutes for the first half to anticipate the fade. So 1:26:48 for the first half, and then 1:33:12 for the second half.A more simple calculation is to take the mean of the ratio between the two half times for everyone in the dataset. This gives a fade coefficient of 1.13. The difference between these two fade co-efficients is due to the lack of constraint used in the fit. The ratio predicts a positive split being inevitable for the fastest runners, which is probably not true. Anyhow, this puts the first half time at 88 minutes for folks looking to run 3 h. These fade co-efficients are good predictors for a range of times, and I suspect would be similar at other marathon events with a similar profile. You can use them to calculate your ideal pace for a target finish time.Finally, for the most accurate answer about sub-3 h pacing, we can look directly at runners finishing between 02:50:00 and 03:00:00 and see what they actually ran. The median first half time was 86.3 min (IQR = 84.4 – 87.87) and the second half was 89.62 (88.07 – 91.12). This gives a median finish time of 2:56:00. So running a 1:26:18 first half would give someone their best chance of finishing in under 3 h, allowing for the inevitable fade.The takeaway message is: to finish within a goal time, do not assume even splits. That is, if you want to run 3 hours 30 min and bank on 90 minutes per half (4:59/km), you will most likely fail to hit the target. Build in a buffer of time to allow for the inevitable fade. A pace of 4:45/km is a better target pace (see below).Good luck!Finish TimeEven split paceTarget pace03:00:0000:04:1600:04:0703:30:0000:04:5900:04:4504:00:0000:05:4100:05:2304:30:0000:06:2400:06:0105:00:0000:07:0700:06:3906:00:0000:08:3200:07:55The codeThis analysis was possible thanks to the uploader for making the chip time data available. Also, a shoutout to Nicola Rennie for sharing how to style social media handles in {ggplot2} graphics. This part of my code requires my {qBrand} library and should be skipped if you are running the code yourself (remove the caption = cap argument in the ggplot calls).library(ggplot2)library(ggtext)sysfonts::font_add_google("Roboto", "roboto")showtext::showtext_auto()## data wrangling ----# load csv file from urlurl