Ah boxing, the gift that keeps on giving

Aside from the fantastic feats of athleticism and inspiring displays of will, not to overlook the lunatic level of coverage required to enter a ring, we have the sideshow to appreciate. The sideshow frequently makes an in-ring cameo (“Fan Man”; Bowe/Golota riot; Mayweather/Judah brawl) but for the most part takes place on either end of the official bout. Any long time fan of the sport can speak to the unease felt when a tight fight comes to a close and the decision is up in the air. Will their fighter come out victorious? Did their favorite do enough to convince the judges?

Close fight or not what turns the stomachs of fans more than a tight fight are the scores that routinely get turned in which bear no resemblance to reality. In instances such as these, depending on your experience, cynicism, and general mood, you cannot help but feel that you have witnessed a fraud. You always know this, but the feeling of outrage is calibrated by the aforementioned influences. Not a year goes by without a high profile bout showcasing this issue. Contested decisions in close fights where more than a handful of 50/50 rounds could be pointed to for supporting a reasonably wide range of opinions is one thing. However, far too often the numbers add up to a fantasy that no other spectator was privy to except for one, or more, of the three people whose opinions actually matter, the judges sitting ringside vested with the responsibility of determining the official result.

Boxing has a long, interesting, and infuriating history of faulty, crooked, and incompetent decisions. I can only assume, both with a shake of the head and a chuckle, that this will continue for the foreseeable future. It is a laugh thrown out to keep from getting upset. It is also a laugh that is authentic, bursting out in the face of absurdity. There are fights, and more specifically rounds, that are difficult to score. That goes for any experienced viewer and creating a consensus grows more difficult when introducing additional critics who assuredly have different criteria for determining the effectiveness of fighters and the styles and tactics implemented. It is inevitable to have disagreements when we leave decisions open to subjective criteria and observations. I am not proposing the removal of this subjective aspect, though I fully endorse available measures that might provide a more clear reading of reality. Instant replay should at least be available in between rounds to determine whether or not a fighter‘s knee or glove touched the canvas, something that would normally be considered a knockdown, or if fighters did in fact foul each other or have their legs/feet tangled, unfairly leading to a knockdown. These are clear examples where we can more closely approach objective standards of what actually happened. Naturally, the final decisions would still be in the eyes of the beholder but even there we can introduce multiple pairs of eyes and a majority vote, much like judges’ scorecards themselves, to determine the most likely thing to have transpired.

Leaving aside the tough to catch, in the moment, exceptional instances fight scoring is a more regular activity that can use deeper interogation. At the very least, the judges’ performances and history can be more clearly evaluated.

The fight game is brutal, corrupt, and predatory. As in so many multi-million dollar operations the people at the bottom get taken advantage of. This appears to be the way of the world but is no less relevant to consider, especially in a profession that requires participants to take physical abuse and brain damage for the sake of others entertainment. Defending boxing becomes a bit more difficult each year. This is not quite the cigarette industry of the 50’s since boxing’s deleterious effects have always been clear, both in the ring and years later, but modern instrumentation is measuring ever more finely the physical toll, mental degredation, and decline in life expectancy as a result of being exposed to repeated head trauma. This not complete ignorance of the consequences has in small part blunted the backlash to date. Certainly being less popular, not offering a centralized league to target, and perhaps being more difficult to identify, gather and group former fighters presents logistical hurdles that the NFL inconveniently has to deal with. This lack of centralization has at least made the charade of denying the risks less necessary.

High cost, low pay

A bad judgment adds one more, unnecessary, impediment to a professional fighter’s chance of making a decent living. The financial stakes grow exponentially the closer boxers get to network and premium television fights. At those levels winning almost guarantees future bouts, perhaps additional screen time, and eventually career high checks in the seven figures, not the tens of thousands of dollars. The opportunity cost of a setback can be devastating.

Beyond the fighter their family suffers. Most people get into fighting for the same reasons we each get into our line of work. It is the best way of making a buck given our skills and opportunities. Thankfully, most of us do not have to rely on our ability to slip, take, and deliver a punch in order to cover the rent. For a few this approach offers a way to make supplemental income and possibly a living. There may one day be greater appreciation for the kind of money given to all but the most famous boxers. Perhaps boxers will band together or a promotional company will see it as a competitive advantage to do some sort of revenue sharing amongst their stable of fighters. Your wishful guess is as good as mine. But if there is going to be something to divvy up then the fight game will need as many eyeballs on screens and butts in the seats to ensure a stream of revenue to split. Bad judgments have the corrupting force of turning away potential fans and even, over time, making lifelong fans fall out of love. This may not be as physically tragic a penalty as the one the fighters pay but the health of the sport depends on the people on both sides of the ropes.

No more snooze button

The direct motivation for a closer look at judges may be attributed to the first GGG/Canelo fight but it is more appropriate to mark this out as the straw that broke the complacent camel’s back. Just a few months earlier one judge’s card in the Pacquiao/Vargas fight, also among the three considered here, had nearly woken me from my slumber but at the very least planted the seed for the current approach. After a lifetime’s worth of incompetent, corrupt, questionable, and, best case, controversial decisions there needed to be several bad cards following one another in close succcession in order to be riled up.

The methodology presented here relied on grabbing the fight scoring history of each of the three judges and comparing their cards with those of their respective colleagues on a fight-by-fight basis over their careers. This was done in order to look at the degree of divergence, consistency, and how often they were the lone voice in the wild.

Objective subjectivity

Judges are only human and judging is a subjective act, but we have to start somewhere and using the scorecards of professionals is as good a place as any. If nothing else it will help raise issues, concerns, and objections to help spur more innovative ways of looking at the problem moving forward. Some of these avenues, in embryo, have been touched on before: punch stats and the wisdom of crowds (or is it the tyranny of the mob?). Even here we are dealing with elements of subjectivity. The input of thousands of Twitter users is easy enough to point at, but can we all agree on what a thrown punch looks like versus a feint and where the latter crosses over into being counted as the former? Even landed punches get tricky when you have skilled defenders like Floyd Mayweather and James Tony slipping and rolling punches (something worth mentioning since it is a skill Canelo has been clearly working on and can be seen in fine display during some of the exchanges, especially on instant replay1 ).

Punches thrown and connected also do not tell you outright about their impact. In the absence of an opponent’s reaction, whether a knee buckle or fall to the canvas (or both), we have to make our best estimate as to the damage dished out. This comes into play even with experienced judges, journalists, and viewers when assessing body shots. A fighter basically has to carry the reputation of being a good body puncher into the fight in order to get credit for the work from the opening bell onward and not rely on the spectators catching on midway through the fight.

So we start with the judges’ cards under the optimistic assumption that overall these people know what they are doing. As in any profession some will be better than others and we look to approximate this difference in ability by investigating their respective bodies of work. One fight may provide too much noise to base an opinion on, the CJ Ross Mayweather/Canelo scorecard notwithstanding. Each of these judges had sat in on hundreds of professional fights as an official. Hopefully they put in hundreds to thousands of hours extra over the years through training. Some of these fights were four, six, and eight rounders. For this exercise all flights that had gone a minimum of 10 rounds and whose scores were available were looked at. It is important to underscore this point so that we are on the same page. Due to Dave Moretti’s extended experience we had in the collection 10, 12, and 15 round fights. In each instance where a fight went at least 10 rounds, regardless of final outcome (e.g. KO in the 11th or RTD in 13th) as long as there were scores available they were used. This was helpful in looking at as many cards and data points as possible, regardless of whether they were relied on officially, granting us a peek into the judges’ performances and preferences over time.

The middle way

Given enough scores, fights, and cards we may expect some occasional bad judgments but for all the noise the GGG/Canelo cards made we assume there are hundreds, thousands, of more cards handed in every year where the scores are basically in line with one another, and hopefully in line with the deserving winner. Basically, we are willing to accept that for most of the judges, on most fights, they get it mostly right (most of the time). It is doubtful to have full unanimity on scores across the three judges on every fight but we can expect them to be pretty close to one another based on professional experience.

Granting some wiggle room for professional differences and fight style preferences the middle card, or median score, was used as a target measure. The assumption being that the median score dampens some of the volatility we see among wide scorecards and splits the difference of preferences. Mind you that when two judges agree they each possess the median score.

The discrepency of a judge’s scorecard against the respective median score of a fight was used as a proxy for divergence and consistency. Professional judges may favor different aspects of a fighter’s approach and effectiveness and as a result show a slight difference of a round or two from one fight to the next but where class shows the preferences of judges should be tuned down in favor of the greater performance. With enough cards we could look to see how each of the judges fared against the median score card over their career.

I was pleasantly surprised by the small variability across the judges’ careers from the median score. For two of the three judges 60% of their cards were the median score. When you throw in half round differences2 you get to just under 70% and just one round difference accounts for 90+% of their cards. The third judge came across as less predictable.

Though open to debate the findings fall in line with a few reputations and confirmed some expectations, making it both attractive to fall for as a proxy and a reasonable candidate for consideration moving forward.

Never miss an opportunity to miss an opportunity

With the fights and median scores identified it was simple to measure the respective judges’ deviations from the middle card. This allowed for capturing deviations, IQR, and identifying outlier boundaries. A standard histogram chart helps to visualize what the percentages say.

hist
Data Source: BoxRec

In two of three instances a (large) majority of cards fell between one round difference or less, the aforementioned 90+%. Adalaide Byrd had a more diverse and spread out distribution. Just 40% of the time did she have the middle card, as opposed to the previously mentioned 60% for Moretti and Don Trella, and the spread remained more even from there out. Byrd was one round “off” or more on nearly two thirds of the cards she handed in. She also had a greater percentage of cards that were off by two, three, and four rounds (GGG/Canelo being the most recent and perhaps notable example).

moretti_trend
Data Source: BoxRec

To highlight these performances yet another way scorecard deviations were plotted over time, from the earliest 10+ round scores to the most recent, ending in 2017. In each chart we show the career median, IQR, and outlier boundaries. The active line is a five fight average meant to remove some of the noise while retaining the general variability for each judge. Lastly, outlier cards were plotted as dots.

trella_trend
Data Source: BoxRec

A few things become noticeable and some bear repeating. Byrd is the only judge with the median not on the X axis, at zero. The IQR is double as opposed to the other judges, 2 versus 1 respectively. The doubling persists when it comes to what would traditionally be considered an outlier (greater than 3QR +1.5 * IQR), 5 rounds difference from median card versus 2.5. As a result of the wider distribution of scores, scores that in our five fight average never touch the x-axis, also an unique feature of Byrd’s scoring pattern mind you, she is the only judge to not have scores to be considered outliers, despite/because of her having more cards than the other judges of two, three, and four rounds off.

byrd_trend
Data Source: BoxRec

These latter non-outlier, but still “off” scores, were plotted for context.

The final visual representation of divergent scoring can be seen in the context of split decisions that did not result in a draw. In the absence of looking at each fight individually and checking news articles, blogs, and discussion boards to determine a consensus, we cannot say if a decision was “wrong”, but when two judges lean one way and the third goes in the opposite direction we can suggest that they are on the “wrong side” of a decision. Sometimes even with the widest scorecard to boot!

div_bars2
Data Source: BoxRec

What we see is more of the same, in the sense that Byrd has a large percentage of wide scorecards and “wrong side” of split decision cards. She is predictably unpredictable, you have to give her that.

Ironically, it didn’t matter

Going into the fight there was concern about Moretti, Byrd made the headlines, but Trella was the one who flubbed by giving round seven to Canelo. Had this round been flipped on his card we still would have had a terrible scorecard from Byrd but at least we would have had a definitive outcome, one that most appear to believe would have been correct, the judging would not have stolen the enthusiasm and attention to such an extent from a hard fought fight, and given the sport a chance to showcase its quality and not the sideshow.

The great irony is that the terrible card was ultimately inconsequential to the result. However, it did allow for an investigation that might prove worthwhile in rating the consistenty and class of a judge. These being just three examples it is too soon to say.

Still here (for now)

As I age I lose my innocent attraction to sport, the docile acceptance of its importance or ability to say anything meaningful. I still see the signs, the storylines, the metaphors claiming greater appreciation but they no longer have the same draw. Whether through negligence or active disdain at a distraction seen as more appropriate for a child I have shed my sporting interests. Perhaps if I were a gambling man I would have cared longer, having an excuse and some reimbursement for my viewing vigilance. Instead, I let my enthusiasms fall away like used tickets. This has happened across the board but for two sports.

I remain interested in men’s tennis because of witnessing possibly the greatest generation and player to have ever done it. Boxing has also endured, despite itself, because when other sports use metaphor, it is boxing imagery and language they use, for boxing in itself is more primal, serious, and natural than all other sports and is able to carry its own appeal with just a little bit of help from its participants. Alongside the individual tragedies of boxing, we have the added casualty of it no longer being one of the top sports, able to highlight the magnificent stories of overcoming. However, when its protagonists capture our imagination they are reimbursed more handsomely than any other athletes, betraying our primal attraction to it and its narratives.


Notes

1 Though that is not what I had in mind when I suggested technology be used to reevaluate the things that did happen. I would think it a step too far at this point to readjust scorecards in between rounds based on clarifications of whether or not punches were slipped.
2 There will be occasional disagreement on scoring when there have been numerous knockdowns, either of one fighter or for both, see Pacquiao/Marquez I; there are also discrepancies as to how to score rounds without knock downs where a fighter completely outclasses his opponent but fails to put them down; as a result the median “card” was determined as the median score for fighter A and fighter B across the three judges.