UEFA CL draws with Monte Carlo integration

R
statistics
UEFA Champions League group stages were finalized this week. I made predictions for which teams are more likely to play against each other using Monte Carlo integration.
Published

November 6, 2022

UEFA Champions League group stages were finalized this week. Tomorrow, draws for the 16th round matches will take place. This provides an opportunity for predicting which teams are more likely to play against each other.

Sneak peek: chances are, Bayern and Liverpool will end up playing against each other.

Liverpool Club Brugge Inter Frankfurt Milan Leipzig Dortmund PSG
Napoli
17.7%
21.8%
21.1% 22.0% 17.4%
Porto 20.7%
13.5% 13.9% 13.9% 13.1% 14.3% 10.6%
Bayern 37.7% 18.8%

23.8%

19.7%
Tottenham
13.6% 19.5%
17.2% 18.2% 18.0% 13.7%
Chelsea
13.7% 19.1% 18.3%
17.5% 17.7% 13.7%
Real Madrid 21.2% 11.0% 14.7% 14.2% 13.9%
13.7% 11.2%
Man City
14.2% 19.3% 18.2% 17.2% 17.3%
13.7%
Benfica 20.5% 11.0% 13.8% 13.5% 14.1% 12.8% 14.3%
Table 1: Predictions

How does the draw work?

  1. Group winners and runners-up in the group stages are separated into two pots.
  2. A team is randomly drawn from the first pot.
  3. The second pot is then rearranged to avoid encounters between teams from the same country or the same group
  4. The second team is drawn
  5. Steps 2 - 4 are repeated until every team is drawn

The pots for tomorrow are as follows.

group Pot 1 Pot 2
A Napoli Liverpool
B Porto Club Brugge
C Bayern Inter
D Tottenham Frankfurt
E Chelsea Milan
F Real Madrid Leipzig
G Man City Dortmund
H Benfica Paris Saint-Germain

More details on the UEFA website.

Caveats

While the process is relatively simple, probability calculations are less so. Some reasons below.

  • Asymmetric encounter probabilities. For example, for Bayern, Club Brugge is 1 of the 4 potential opponents - whereas for Club Brugge, Bayern is 1 of the 7 potential opponents.

  • Dependence on previous draws. Each draw impacts subsequent event probabilities. If Club Brugge is drawn against a different team, Bayern only has 3 potential opponents left - which impacts probabilities.

  • It is possible that the sequence of draws results in the final draws violating the rules. Consider the scenario below, where the first 7 draws are:

    Draw Team 1 Team 2
    1 Benfica Club Brugge
    2 Napoli Frankfurt
    3 Real Madrid Paris Saint-Germain
    4 Man City Leipzig
    5 Tottenham Milan
    6 Porto Liverpool
    7 Chelsea Inter

    The issue here is that the remaining teams are Bayern and Dortmund, which are both German teams and cannot play against each other. In other words, in this case the draw has to be restarted.

Monte Carlo

A straightforward solution for aforementioned complexities is estimating encounter probabilities using Monte Carlo integration. Simply put, we simulate the 16th round draw many times and count to what extent 1) violations and 2) team encounters occur.

Let’s start with mimicking a single draw.

function(standing = tbl_standing) {

    # make pots
    pot1 <- standing[standing$rnk == 1, ][["team"]]
    pot2 <- standing[standing$rnk == 2, ][["team"]]

    # init 16round schedule df
    schedule <- data.frame(t1 = character(), t2 = character())

    # repeat draw 8 times
    for (drawing in 1:8) {
        draw1 <- sample(pot1, size = 1)
        draw1_gr <- standing[standing$team == draw1, ][["group"]]
        draw1_cn <- standing[standing$team == draw1, ][["country"]]

        # rearrange to subset of pot 2
        # 1) team in (updated) pot; 2) not of same country / group
        pot2_subset <- standing[
            (
                standing$team %in% pot2 &
                    standing$country != draw1_cn &
                    standing$group != draw1_gr
            ),
        ][["team"]]

        # draw 2nd team
        # if pot2_subset is empty - this will throw an sample.int error
        draw2 <- sample(pot2_subset, size = 1)

        pot1 <- pot1[pot1 != draw1]
        pot2 <- pot2[pot2 != draw2]
        schedule <- rbind(schedule, data.frame(t1 = draw1, t2 = draw2))
    }
    schedule
}
<bytecode: 0x112494508>

Where tbl_standing looks like this

rnk team country group
1 Bayern GER C
1 Benfica POR H
1 Chelsea ENG E
1 Man City ENG G
1 Napoli ITA A
1 Porto POR B
1 Real Madrid ESP F
1 Tottenham ENG D
2 Club Brugge BEL B
2 Dortmund GER G
2 Frankfurt GER D
2 Inter ITA C
2 Leipzig GER F
2 Liverpool ENG A
2 Milan ITA E
2 Paris Saint-Germain FRA H

The results of a single draw of this function is then:

t1 t2
Bayern Paris Saint-Germain
Tottenham Club Brugge
Chelsea Dortmund
Napoli Leipzig
Man City Milan
Real Madrid Frankfurt
Benfica Inter
Porto Liverpool

In the end, Monte Carlo is about repeating the draw many times and calculating relative frequencies - accounting for potential errors due to the described problems.

NERRORS <- 0
NSIM <- 10000
iter <- 1:NSIM

results <- lapply(iter, function(i) {
    tryCatch(
        expr = draw_ko_phase(),
        error = function(err) {
            NERRORS <<- NERRORS + 1
            return(NULL)
        }
    )
})

results |> 
    bind_rows() |> 
    count(t1, t2) |>
    mutate(prob = n / (NSIM - NERRORS))

Firstly, the probability of a restart of the draw is not to be underestimated! There is a 25.7% chance of having a draw that results in a scenario where the final teams in the pots are from the same group / country.

For the successful draws, estimations are presented in Table 1. The values indicate the probability of encountering the opponent. Chances are that the draw will result in a 16th round match between Bayern and Liverpool (37.7%). In the context of Monte Carlo this can be interpreted as

From all the successful draws in the many repeats, 37.7% of the scenario’s had an encounter between Bayern and Liverpool.

Recap

  • UEFA’s restrictive rules can lead to imbalanced encounter probabilities.

  • In literature some corrections are described. See for example Robers & Rosenthal (2022).

  • In many complex situations Monte Carlo simulation is a very flexible method to obtain probabilities.