Saturday, January 1, 2011

Designing a Boston Marathon Age Distribution That Better Reflects Marathoner Demographics

I've finally had the Boston Marathon 2010 individual result data as well as some time to take a closer look, and I have some surprising findings. My goal was to take the overall finisher count (22,540) and determine what a "perfect" distribution of runners might look like, based on marathoner demographics. The data suggest that the younger age divisions are actually under-represented, and bloating occurs in the older divisions--most pronounced in the 45-49 age group.

Of course, this is my approach based on some assumptions and based on the data available to me, as well as the fact I am not in favor of dramatically increasing the number of available slots to Boston runners. I am in favor of a sufficiently "elite" (term used very loosely; most of us running Boston are certainly not elite runners) qualifying standard that the event doesn't sell out in hours (a point of contention for many, I know). So, my "perfect" Boston finisher field will assume the same number of finishers as we had in 2010. I do recognize quite a few more qualified and didn't start or didn't finish for various reasons.

The first thing I noticed is that the Boston Qualifying times don't match the actual age divisions in Boston for one group of runners. The qualifying times are split 0-34 years old and 35-39 years old, yet the actual age division is 0 to 39. I believe the age divisions should match the qualifying times, although I understand the reasoning is probably "open" vs. "masters." However, if that is the logic applied, it should be consistent. For the purposes of this exercise, I am assuming that there are separate age divisions for 0-34 and 35-39, due to the qualifying times (even though in practice, that is not true; I've created it).

In the 2010 Boston Marathon, there were 22,540 finishers, and it's these results on which I will be basing my analysis. I don't have visibility into who actually entered, but for the conclusions I will draw, actual qualifying times for entrants don't matter (I'm not making actual recommendations on the times; just the target distribution by age division). Also, I couldn't factor in how many bibs are given away or earned through charity partnerships, however, I'll make the assumption that these runners follow the overall demographics of other runners.

My purposes here today are to take that 22,540 finisher count, and see how each age division fared against a hypothetical "fair" distribution. Since it's my blog, I'm choosing fair in what I believe is an objective approach. I'm defining a Boston Marathon field (using the finishers count) that largely mirrors the running population demographics of today and has the same number of finishers as the 2010 Boston Marathon. While some may take exception to this goal, my purpose will be to point out what that Boston field "should" look like, vs. what it actually looked like in 2010, by age and gender division.

While I've seen varying statistics on male vs. female marathon runners (up to an even 50/50 split of male / female), I'm going a hard line route to quote the 2009 Marathon and State of the Sport Reports using 2008 data to find the goal male / female split for Boston. This puts our target Boston Field at 41 percent female and 59 percent male (the widest recent margin I could find). The math is pretty easy then for our 22,540 2010 Boston Marathon finishers, there should have been:
  • 9,241 females (9,474 actually finished)
  • 13,299 males (13,066 actually finished)
We're not too far off here, 42 percent of actual finishers were ladies, and 58 percent were men. But, let's be fair and a little more specific, one percent of the finishers who are ladies (or, 2.4% equal to 233 of the ladies) have to go to make up my perfect Boston field to match these demographics.

Now, one thing that has been pointed out by many, lay and official alike, is that the one age division where women outnumber the men is 0 to 39, and this is certainly one area that should get some attention. I begrudgingly accepted these facts until I looked at the data. Here was my next big surprise. My target Boston field isn't the result of the same number of participants per age division; that isn't fair (example: there aren't the same number of 65-69 year olds running as 35-39 year olds). Instead, let's look at the distribution of marathon participants, by age division, today:

I did have to split a few of the age divisions as follows. Not perfect, but I don't think it's too far off, especially considering the size of the 0-34 age group.

* Split 35-44 in half for each age group, 50/50
** Split 45-54 in half for each age group, 50/50
*** Split 55-64 in half for each age group, 50/50

Note that of actual marathon runners, nearly half of all ladies are 0-34 years old and a third of all men are in the same age group. I actually found that pretty interesting. Without examining the more detailed data, it had seemed the older age groups were getting short-changed at Boston, but the data actually says quite the opposite. Looking at these target distributions, here's what "my" target Boston field should look like, with the "Boston Distro Goal F / M" field showing the "target count" for each age division to create my goal finishing field:

Now, here are the actual Boston Marathon 2010 results, broken out in COUNT by AGE DIV showing MINIMUM, MEAN, MEDIAN and MAXIMUM finishing time, followed up by the Boston Qualifying time for that age group:

** Omitted because qualifying times are split in the 0-34, 35-39 divisions
From this data it isn't hard to calculate my target Boston field and the difference between actual and goal, which is this:

* Data is combined into a 65+ division
** Omitted because qualifying times are split in the 0-34, 35-39 divisions

Shown on a graph, you can see bloating (too many runners in a division) and under-representation (too few runners in a division) pretty clearly. It was startling to me to conclude that both men AND women are significantly under-represented in the 0-34 age division, and that it did seem that older age groups, especially the men, were bloated. This data suggests the qualifying time for a 0-34 year old male at 3:10:59 probably is too harsh, and that standards need to be toughened up a bit in other areas.

For the ladies, 40-44 is slightly bloated and 45-49 more so. After that we're talking about very few ladies at all. So, it looks like, in fact, my qualifying time IS TOO EASY (look, guys, you were right all along!). The male 45-49 age group was the most bloated of all, slightly edging out the bloating of the ladies of the same age.

Graph of Female Differentials: Above the zero line shows how many more ladies should be in my "target" Boston field. Below the zero line shows the bloating, where too many ladies were in the division:

Graph of Male differentials, as above. Bloating is shown below the zero line. Again, counts above the zero line are additional runners who should be in the division:

Today, I'll stop short of what I think the qualifying times should be, I'm sure this post will be controversial enough without it. Yes, I believe we should keep the available bibs approximately the same. I no longer believe the standards should be adjusted across the board, but I do believe that certain age groups stand to be loosened, while others tightened. I also believe this should be examined every few years, to keep with the contemporary demographics of marathon runners: maybe we age, maybe more females start running. Whatever it is, Boston will remain to me one of the most exciting challenges and goals I've ever met, and I hope it continues to be the much sought after goal by all determined marathon runners alike.


  1. You put a lot of work into that, Alex. It will be very interesting to hear what you think the qualifying times should be--and also of course how they compare with what the BAA decides (hopefully later this month--I want to know what I'm really up against). Too bad I didn't have it together enough to try this five years ago--oh well!

  2. Terzah, Thanks for the visit! Yes, my family was starting to slide food under the door because once I got the initial spreadsheet of individual records, I couldn't leave my computer :) Nearly pulled an all-nighter, but it was worth it. I can't wait to see what the BAA does. I really wanted to do an analysis of the qualifying times, but I felt that the one piece of data that would be most helpful would be the qualifying times of those who were actually entering, and doing a cumulative distribution analysis. My second idea is to take one significant marathon (e.g., Chicago or New York) and do the analysis from there. Depending on what the BAA comes out with, I might or might not go to the trouble. I am eagerly awaiting! Happy New Year! --Alex