Matchmaking Ruins Everything

Is it even possible to have stable MMR with SBMM? We look at the available skill rating systems and their effects on SBMM.

Published by

Charlie Olson

Mar 1, 2023

Imagine a 1v1 game (like chess), four players of equal skill, and initial matchmaking rating (MMR) values of 3.

Everyone in this example has the same underlying skill; therefore, everyone should have similar MMR values. The ideal MMR histogram would essentially be a single column at 3 (with some oscillations):

ideal MMR histogram of 4 players of equal skill

This is indeed what happens in an Elo-like MMR system as long as matchmaking is random. However, once skill-based matchmaking (SBMM) is added to the mix, it all goes to shit.

MMR histogram of 4 players of equal skill with perfect SBMM

ASCII Example

With perfect SBMM, the MMR distribution will become uniform over time — exactly the opposite of what it should be. The “correct” MMR distribution in this case should be perfectly tall and narrow (constant), but SBMM leads to perfectly short and wide (uniform).

This scenario is simple enough that we can step through it by hand:

All four players start with MMR = 3
Two matches are made at random
Winners gain 1 MMR; Losers lose 1 MMR
Repeat — only selecting perfect MMR matches
After the 3rd iteration, the distribution will be uniform, and no matches can be made

The process as an ASCII illustration:

The More Realistic Elo Example

Perfect SBMM breaks Elo in theory — Elo’s tuning parameters never even come into play if there is no difference in MMR. But SBMM in practice is rarely perfect. So what happens to an Elo system under more realistic circumstances? Will SBMM still have a partial flattening effect?

I wrote a simulation to find out:

Elo MMR Distribution — with and without SBMM

As expected, SBMM dramatically flattens the MMR distribution, just not all the way to uniform. Which is still bad. In fact, Elo under SBMM will continue expanding indefinitely in the absence of hacks to constrain it, but the expansion rate depends on the “tightness” of the SBMM.

This isn’t just a problem in simulation; it’s probably one of the main issues with the International Chess Federation’s Elo system:

In the past decade, certain innovations have caused rating deflation, a concern that has been raised by professional players and mathematicians and did not go unnoticed by FIDE. Players’ ratings are spread out too widely, and the situation is deteriorating with each passing year. — FIDE

*There are other inherent problems with Elo, though, e.g., the fundamental conservation of MMR is actually a flaw, since the distribution of skill is unlikely to be symmetrical, and clamping the low end while reducing the k-factor at high skills makes the distribution shift left as new player skills improve over time (TrueSkill has a bigger problem with this, though).

The Value of Simulation

A simulator allows us to compare different systems using the exact same virtual players under different conditions. The simulator is a simplified model of reality, so if an MMR algorithm doesn’t work here, it would be unreasonable to expect it to magically work in the more complicated real-world.

True story: most game developers only run simulations on historical match data. This fails to capture the feedback effect of SBMM.

AB tests on live players are also necessary for validation, but are too slow and difficult to be useful for iteration during algorithm development.

In a nutshell, a robust simulator is necessary.

The TrueSkill Example

In video games, Microsoft TrueSkill is the mostly widely recognized MMR algorithm. Is TrueSkill more invariant under SBMM than Elo? Let’s see:

TrueSkill MMR Distribution — with and without SBMM

TrueSkill also suffers from MMR expansion, but not quite as bad as Elo.

Side note: TrueSkill’s improvement in stability comes at a cost. The distribution expands less because TrueSkill decreases the step size of MMR updates over time. This, however, makes it ripe for smurfing and problematic for player-facing MMR.

Smurfing side note: TrueSkill is heavily biased by your initial matches. If you deliberately play badly for a few dozen matches to start, you can guarantee yourself easy matches for a long time (the flipside of this is the reason why skilled players often have sub-50% win rates in TrueSkill). Similarly, in a Glicko system, you can exploit the dynamic variance to farm MMR.

IVK Skill in Casual

This has been a lot of doom and gloom so far. Is it even possible to have stable MMR with SBMM? The short answer is yes.

(This is sort of a sales pitch)

At Invokation Games, we have a class of algorithms we call IVK Skill (IVK an esoteric acronym: Ideal, Variance, K-factor), with invariant distributions under the entire range of matchmaking conditions. Here’s one example:

IVK Skill in Casual MMR distribution — with and without SBMM

Ok, this isn’t perfect perfect, but it’s pretty darn close.

IVK Skill in Ranked

IVK Skill MMR distributions don’t only have to be symmetrical. Here’s an example of an asymmetrical, positively-unbounded Ranked distribution — simulated with SBMM-only, since that’s how Ranked modes work:

IVK Skill in Ranked MMR distribution — with SBMM

As far as I know, IVK Skill is the only generalized MMR algorithm with a configurable, deterministic, long-term MMR distribution.

Implications

The main issue with Elo, TrueSkill, and conventional MMR systems is that they require constant maintenance and tuning. Players aren’t fond of the workaround solutions (e.g. “hidden MMR” or massive Elo remappings), and data scientists aren’t cheap.

The root problem is that these MMR systems have unpredictable distributions. They’re not invariant under different SBMM conditions.

MMR distributions might be consistent from one season to the next — but they’re not analytically predictable or controllable. With the launch of a new game, a new matchmaker, or a major change in the meta, everything becomes uncertain again.

IVK Skill solves the root problem. It delivers the spirit of Elo: intuitive player-facing MMR updates and simple configuration options — without the fatal flaw of uncontrolled, unpredictable expansion and/or sliding.

IVK Skill also works for any number of players or teams, with or without placement matches, and for any combination of personal or team performance.

Note: Team-balancing

While perfect SBMM is unlikely, perfectly balanced teams can be relatively common in multiplayer games. If MMR is updated based on the team outcome, perfect team-balancing has the same effect as perfect MMR. In other words, perfect team-balancing will rapidly expand an Elo system, and no tuning of the variance or k-factor can fix that.

For systems that use the team outcome (win/loss) to drive MMR updates, it makes sense then to either deliberately unbalance the teams, or switch to IVK Skill (obviously).

To make the switch, get in touch.