Rec’ing on…Olympic Gymnastics Scoring

Among the various controversies that surround any Olympics gymnastics competition, none is more consistent or more frustrating than the mess that is artistic gymnastics scoring. Can it be fixed? Ultimately, yes, but in the interim there are improvements that can certainly be made by FIG (Fédération Internationale de Gymnastique), the governing body for gymnastics.

The Current Situation

Each apparatus is judged by eight judges in addition to one or two timers as warranted by an apparatus. Two of the judges compute the “A” score, or difficulty of the routine as performed (among other factors).  The other six judges provide the “B” score, or the value of deductions subtracted from a theoretical perfect 10.  The purpose of this was to make more transparent the difference between the quality of a performance vs its difficulty.

Now here’s the rub: the judges selected for an apparatus are chosen from a pool that excludes the countries the participants on an apparatus in a rotation. This was to prevent the all-too-common situation where national allegiances would excessively skew scores. Unfortunately, this also means that during finals those countries with the best gymnastics federations (i.e. those with the top gymnasts and judges) tend to have no judging representation. While this helps to ward off political bias, it also increases the likelihood of judging errors.

Additionally, the Olympics have adopted a procedure to avoid having to award duplicate medals. This tiebreaking procedure is not well understood, and can be viewed as being arbitrary, with no basis in reality.

Ordering Chaos

The ideal solution is to have computers analyze a motion capture of a routine and with cold efficiency evaluate each move and connection against an idealized standard that it adjusts to the skeletal proportions of a gymnast. This mechanism would also have the ability to discern new elements not in the database and apply artificial intelligence to compare related elements to achieve an ad hoc score while also noting this new move for further refinement outside of the competition. This ability does not yet exist, however, and so we must rely on the imperfections and biases of human judges.

I think the first step is to establish an international pool of well-accredited and trained judges who, as much as possible, have a history of accuracy and lack of national bias. For competitions such as the World Championships and the Olympic Games, the judges should be garnered from this population and not provided by national federations.

As is currently being adopted, scoring should utilize modern technology to provide for accuracy of judging.  This is especially important for an apparatus such as the vault where minutiae often slip past even the most conscientious judge. I propose that for vault, ALL routines are evaluated in slow motion from at least two different angles to provide adequate front and side coverage from the spring board through the landing. For other apparatus, I suggest that replay be available in the event of a scoring conflict.

As for scoring…the current method of taking the six scores and tossing out the highest and the lowest scores is sufficient to mitigate some bias. As now, a mean should be calculated: totally the remaining four scores and dividing by 4. This is the baseline score. A scoring anomaly will occur if the standard deviation of this score exceeds some pre-determined limit: I propose a standard deviation of 0.1—if the scores aren’t within 0.2 of each other, then clearly someone missed something or is biasing. When there is a scoring anomaly, a provisional posting of the score will be announced and the routine will be reviewed via replay (slow-motion for the vault, actual speed for the other apparatuses); ideally this review would take place outside of the competition venue, but that’s not generally realistic. The review might also use a different panel of judges so as to not unnecessarily slow down the competition as athletes wait for final scores.

Tiebreaker: the taller gymnast wins because it’s easier when you’re smaller. Kidding. Honestly, I think ties are fine.  This is still a subjective event, after all.  Trying to force a tiebreaker when there isn’t anything meaningful to measure is just dumb.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.