BTN: System Accuracy

Unlike my last few relationship I’m going to start this rant by being painfully honest. This rant contains math. There I said it, but now that we know, lets try to work through this together.

I’ve tried to provide examples and keep the math simple. If you have any questions please feel free to ask and I’ll attempt to explain further. Even for those that hate math I’d suggest trying to read through it a time or two.

With the disclaimer out of the way, lets move on. The point of this rant is to examine how best to measure and represent system accuracy. I’m going to go over three methods that have been suggested and show how the current system model on the site measures up. There will also be a brief(ish) discussion of what I perceive the pros and cons to be with each approach. Feedback here is much appreciated as I’d like this metric to be something people could quickly use to weigh one system against another.

Method 1: Heads I win, Tails you lose

The first approach is the most basic. It looks at every fight and measures how many fights the higher rated fighter won. It then divides that number by the total number of fights.

Example

Assume there were only 10 fights, and the higher rated fighter won 4 of them, then the result would be 40%. That is to say that the system was correct 40% of the time.

The good, the bad, and the ugly

PROS: One of the upsides of this approach is that the number is very easy to understand. It’s very easy for someone to look at past picks and understand they got 2 out 5 right. Another benefit is that this number is easily compared to other systems.

CONS: The biggest downside to this metric is that it deviates from the entire basis of the site. When talking about sports almost nothing is ever 100%. One of the major benefits of the site is that it is able to recognize that even though one fighter is rated higher that doesn’t guarantee victory or even proclaim vast superiority(such proclamations should be left to the fans and the fighter’s mom).

Another potential drawback is that every single fight is counted.

Thoughts: Although I like the simplicity of the number and how portable it is, I don’t like how it fails to utilize the site’s expected win percentage model. Additionally, I think if an approach like this were to be used it might make sense to specify additional parameters(ex. minimum number of fights).

Site result: 65%

Method 2: Fighter history

The way this approach works is by going fighter by fighter looking at every fight they had past their sixth fight where their opponent also had at least six fights previously. For each of those fights I calculate their expected number of wins, and total their actual number of wins.

The absolute value of the difference between expected wins and actual wins is then accumulated across all fighters and this number is ultimately divided by the total number of fights.

Example

Lets look at a two fighter example to get a better idea of how this works:

Fighter A
Total fights = 2
Expected wins = 1
Actual wins = 1
Absolute difference = 0

Fighter B
Total fights = 4
Expected wins = 3
Actual wins = 2
Absolute difference = 1

In this case our total absolute difference is 1 and our total number of fights is 6. From here we can divide the total absolute difference by the total number fights to get a percentage of incorrect outcomes across all fights.

1 / 6 ~ 0.167

We can then subtract that number from 1 to get a percentage correct for the system:

1 – 0.167 = 0.833 ~ 83%

The Good, The Bad, and The Ugly: The Quickening

PROS: Unlike the previous approach this method factors in the expected win percentage. Which is good because there is a lot of value in the expected win percentage and not just for looking at “pot odds” when placing a bet on a given fight.

CONS: The approach used in this method is definitely more complicated than the first. There is also the possibility that this number doesn’t represent a useful metric for people.

Thoughts: Overall I like this approach better than the first method. It factors in the expected win percentage and is an overall deeper number. There’s no doubt that it would be misinterpreted at a glance by some, but I’m willing to put in the time so that anyone truly interested would be able to understand it.

Basically this number helps give us an idea of the average fighter’s actual performance compared to their expected performance. I’m not sure how useful a measure that is though, so feedback is definitely welcomed on this method.

Site result: 86%.

Method 3: Slice of Life

The final suggested method is an extension of the rant I did a little while ago. How it works is by slicing all fights up into smaller pools of fights based upon the difference in rating between the two fighters. From there it looks at the actual percentage of fights won by the favorite versus the expected percentage of fights that would be won.

In order to get an overall picture of the system on a per fight basis we multiply the number of fights in a given slice, by the absolute difference, accumulate that value across all slices and then divide by the total number of fights from all slices.

I know that might sound complicated, but if I wrote it as a formula with sigmas and stuff you’d hate me even more. Even if the math sounds like gibberish please try to picture what it represents in real world terms. What this method boils down to is showing how close to the expected win percentage various rating slices actually come.

Example

Consider two very broad slices:

1-300 Rating difference:
Total fights = 10
Expected win percentage 65%
Actual Win Percentage 70%
Absolute Difference = 5%

301-600 Rating difference:
Total Fights = 5
Expected win percentage = 85%
Actual win percentage = 70%
Absolute difference = 15%

We then take ((5 * 10) + (15 * 5)) / (5 + 10) ~ 8.33%

That’s means that on average the above system would be off by about 8.33% or put another way it’s about 91.67% accurate in terms of expected win percentage based upon rating versus the actual win percentage.

The Good, The Bad, The Ugly, and The Crystal Skull

PROS: The stat focuses in on the expected win percentage approach. It allows one to quickly see how real life results are measuring up against predicted outcomes. This number helps give an idea of the overall accuracy of the system when it says Fighter A is expected to win 65% of the time versus Fighter B.

CONS + Thoughts: At first I was going to say that it’s more of a system stat than a user stat, but that’s pretty much what we are looking for here. What we want is a way for a user to gauge how accurate the other numbers they are seeing are. It’s great to claim that Fighter A will win 45% of the time, but if in actuality the expected win percentages are off by 40% then the value of the original expected win percentage number is greatly deflated.

Site result: 97%

some words Just random

Whenever you deal with stats it’s important to know exactly what the number you are looking at represents and whether that number is remotely relevant to what you are doing.

For a lot of people all they will care about is method 1. Sadly, that wastes a lot of the systems value. There is a world of difference between a 1601 rated fighter taking on a 1600 rated fighter and a 1950 rated fighter taking on a 1700 rated fighter. To simply say A > B (even if by the slimtest of margins) portents guranteed victory is a mistake.

Being the numbers geek that I am, method 3 is the most interesting to me. It’s a number that helps clarify the validity of other numbers. It’s interesting to see how close (on average) to expected results the system is actually coming. This also provides a degree of cushion when weighing expected win percentage between two fighters of various ratings.

If there are any methods I missed or any additional parameters you would like to see applied to any of the methods, please let me know. I’d really like to reach a bit of a consensus on this in the near future as it will prove useful with a couple future features/rants.

Tags: , ,

5 Responses to “BTN: System Accuracy”

  1. random says:

    one could imagine the possibility that accuracy is padded even with the 6 fight cutoff. what about limiting it in different ways to test the fidelity of the data, like fights only in the UFC, only UFC fighters (also allowing their fights in orgs other than UFC/Pride), etc

  2. evil pooh says:

    The six fight cutoff was actually a specific request from someone. Originally the site was designed around a higher cutoff. However, I took the time to run the numbers with assorted cutoffs and even when no cutoff is specified the results remain inline with what is expected.

    Please keep in mind that the scope and intent of the site is to measure all MMA fights. However, I have checked numbers previously (not sure if they are in a rant somewhere or on the forums or not) across a multitude of subsets:

    - UFC only fights
    - UFC + pride
    - Last 3 years
    - etc.

    Sample size will always be a major issue when dealing with MMA and although the results are similar to what’s expected with the smaller subsets, it also has a lot more variance.

    I think it’s important to keep in mind that all the fighters shown on the site are connected. When you look at the site you are looking at all ufc fighters and all the fighters they have fought and all of the fighters those fighters have fought. The MMA landscape is a huge interconnected web, and I think some people lose site of that at times.

  3. Mma says:

    Simple and nice ;) Sorry I’m late adding this comment. I searched for “mma” and I found your post “BTN: System Accuracy”. just added you to my feed reader.

  4. [...] predictions? Read my earlier post. Or see this (there’s a Web site that compiles the numbers): MMA-ELO Blog Archive BTN: System Accuracy __________________ "Stupid babies need the MOST love" White Rice: "tim silva [...]

  5. EDITHCastro27 says:

    A lot of specialists claim that loans help a lot of people to live their own way, because they can feel free to buy needed stuff. Moreover, some banks offer credit loan for different classes of people.

Leave a Reply

You must be logged in to post a comment.