UEFA CL draws with Monte Carlo integration

statistics

UEFA Champions League group stages were finalized this week. I made predictions for which teams are more likely to play against each other using Monte Carlo integration.

Published

November 6, 2022

UEFA Champions League group stages were finalized this week. Tomorrow, draws for the 16th round matches will take place. This provides an opportunity for predicting which teams are more likely to play against each other.

Sneak peek: chances are, Bayern and Liverpool will end up playing against each other.

	Liverpool	Club Brugge	Inter	Frankfurt	Milan	Leipzig	Dortmund	PSG
Napoli		17.7%		21.8%		21.1%	22.0%	17.4%
Porto	20.7%		13.5%	13.9%	13.9%	13.1%	14.3%	10.6%
Bayern	37.7%	18.8%			23.8%			19.7%
Tottenham		13.6%	19.5%		17.2%	18.2%	18.0%	13.7%
Chelsea		13.7%	19.1%	18.3%		17.5%	17.7%	13.7%
Real Madrid	21.2%	11.0%	14.7%	14.2%	13.9%		13.7%	11.2%
Man City		14.2%	19.3%	18.2%	17.2%	17.3%		13.7%
Benfica	20.5%	11.0%	13.8%	13.5%	14.1%	12.8%	14.3%

Table 1: Predictions

How does the draw work?

Group winners and runners-up in the group stages are separated into two pots.
A team is randomly drawn from the first pot.
The second pot is then rearranged to avoid encounters between teams from the same country or the same group
The second team is drawn
Steps 2 - 4 are repeated until every team is drawn

The pots for tomorrow are as follows.

group	Pot 1	Pot 2
A	Napoli	Liverpool
B	Porto	Club Brugge
C	Bayern	Inter
D	Tottenham	Frankfurt
E	Chelsea	Milan
F	Real Madrid	Leipzig
G	Man City	Dortmund
H	Benfica	Paris Saint-Germain

More details on the UEFA website.

Caveats

While the process is relatively simple, probability calculations are less so. Some reasons below.

Asymmetric encounter probabilities. For example, for Bayern, Club Brugge is 1 of the 4 potential opponents - whereas for Club Brugge, Bayern is 1 of the 7 potential opponents.
Dependence on previous draws. Each draw impacts subsequent event probabilities. If Club Brugge is drawn against a different team, Bayern only has 3 potential opponents left - which impacts probabilities.
It is possible that the sequence of draws results in the final draws violating the rules. Consider the scenario below, where the first 7 draws are:

Draw Team 1 Team 2

1 Benfica Club Brugge

2 Napoli Frankfurt

3 Real Madrid Paris Saint-Germain

4 Man City Leipzig

5 Tottenham Milan

6 Porto Liverpool

7 Chelsea Inter

The issue here is that the remaining teams are Bayern and Dortmund, which are both German teams and cannot play against each other. In other words, in this case the draw has to be restarted.

Draw	Team 1	Team 2
1	Benfica	Club Brugge
2	Napoli	Frankfurt
3	Real Madrid	Paris Saint-Germain
4	Man City	Leipzig
5	Tottenham	Milan
6	Porto	Liverpool
7	Chelsea	Inter

Monte Carlo

A straightforward solution for aforementioned complexities is estimating encounter probabilities using Monte Carlo integration. Simply put, we simulate the 16th round draw many times and count to what extent 1) violations and 2) team encounters occur.

Show me some code

Let’s start with mimicking a single draw.

function(standing = tbl_standing) {

    # make pots
    pot1 <- standing[standing$rnk == 1, ][["team"]]
    pot2 <- standing[standing$rnk == 2, ][["team"]]

    # init 16round schedule df
    schedule <- data.frame(t1 = character(), t2 = character())

    # repeat draw 8 times
    for (drawing in 1:8) {
        draw1 <- sample(pot1, size = 1)
        draw1_gr <- standing[standing$team == draw1, ][["group"]]
        draw1_cn <- standing[standing$team == draw1, ][["country"]]

        # rearrange to subset of pot 2
        # 1) team in (updated) pot; 2) not of same country / group
        pot2_subset <- standing[
            (
                standing$team %in% pot2 &
                    standing$country != draw1_cn &
                    standing$group != draw1_gr
            ),
        ][["team"]]

        # draw 2nd team
        # if pot2_subset is empty - this will throw an sample.int error
        draw2 <- sample(pot2_subset, size = 1)

        pot1 <- pot1[pot1 != draw1]
        pot2 <- pot2[pot2 != draw2]
        schedule <- rbind(schedule, data.frame(t1 = draw1, t2 = draw2))
    }
    schedule
}
<bytecode: 0x112494508>

Where tbl_standing looks like this

rnk	team	country	group
1	Bayern	GER	C
1	Benfica	POR	H
1	Chelsea	ENG	E
1	Man City	ENG	G
1	Napoli	ITA	A
1	Porto	POR	B
1	Real Madrid	ESP	F
1	Tottenham	ENG	D
2	Club Brugge	BEL	B
2	Dortmund	GER	G
2	Frankfurt	GER	D
2	Inter	ITA	C
2	Leipzig	GER	F
2	Liverpool	ENG	A
2	Milan	ITA	E
2	Paris Saint-Germain	FRA	H

The results of a single draw of this function is then:

t1	t2
Bayern	Paris Saint-Germain
Tottenham	Club Brugge
Chelsea	Dortmund
Napoli	Leipzig
Man City	Milan
Real Madrid	Frankfurt
Benfica	Inter
Porto	Liverpool

In the end, Monte Carlo is about repeating the draw many times and calculating relative frequencies - accounting for potential errors due to the described problems.

NERRORS <- 0
NSIM <- 10000
iter <- 1:NSIM

results <- lapply(iter, function(i) {
    tryCatch(
        expr = draw_ko_phase(),
        error = function(err) {
            NERRORS <<- NERRORS + 1
            return(NULL)
        }
    )
})

results |> 
    bind_rows() |> 
    count(t1, t2) |>
    mutate(prob = n / (NSIM - NERRORS))

Firstly, the probability of a restart of the draw is not to be underestimated! There is a 25.7% chance of having a draw that results in a scenario where the final teams in the pots are from the same group / country.

For the successful draws, estimations are presented in Table 1. The values indicate the probability of encountering the opponent. Chances are that the draw will result in a 16th round match between Bayern and Liverpool (37.7%). In the context of Monte Carlo this can be interpreted as

From all the successful draws in the many repeats, 37.7% of the scenario’s had an encounter between Bayern and Liverpool.

Recap

UEFA’s restrictive rules can lead to imbalanced encounter probabilities.
In literature some corrections are described. See for example Robers & Rosenthal (2022).
In many complex situations Monte Carlo simulation is a very flexible method to obtain probabilities.