Births & conceptions in Belgium

---
title: "Births & conceptions in Belgium"
output: 
  flexdashboard::flex_dashboard:
    theme: lumen
    source: embed
    storyboard: true
editor_options: 
  chunk_output_type: console
---

```{r setup, include=FALSE}
library(flexdashboard)
library(dplyr)
library(ggplot2)
library(ggiraph)
library(plotly)

births_conceptions_be = readRDS('births_conceptions_be.rds')
theme_set(theme_minimal())

# helpers ----------------------------------------------------------

scale_date_custom = function() scale_x_date(
  date_breaks = '2 years', date_labels = "'%y", 
  limits = as.Date(c('1992-01-01', '2020-01-01'))
)

# knitr::opts_chunk$set(eval = FALSE)
```


### Births in Belgium between 1992 - 2020; or how you should **NOT** visualize the data.

```{r}
gg = ggplot(births_conceptions_be, 
            aes(x=date, y = nbirths))+
  geom_line()+
  geom_smooth()+
  scale_date_custom()+
  scale_y_continuous(limits = c(0, 500))+ 
  labs(
    x = 'Year', 
    y = 'Number of births', 
    title = 'Daily births in Belgium
^{Between 1992 and 2020}', 
  )

plotly::ggplotly(gg) %>% 
  plotly::style(traces = 2:3, hoverinfo = 'none')
```

*** 

Recently, one of my colleagues shared an open source 
dataset on the daily number of births in Belgium.
Exploring the data, it quickly became clear 
birthdays are less random than you might think. 

In the next few blocks, I hope to have shared some 
insights.
---

Let's start with the time series plot on the left. 
We can notice that there is
  1. quite some **variation** in the number of births on any given moment
  2. an indication for a **change in time**

The plot isn't ideal though... you'll see why :-)

---

Details on the data

The dataset is made available Statbel, the Belgian
statistical office. You can access it 
[here](https://statbel.fgov.be/en/open-data/number-births-day).

The number of births per day between 1 Jan 1992 and 31 Dec 2020 are reported.

Based on the dates, some extra features are included
such as the weekday, day of year and Belgian historical holidays 
that were scraped [here](https://www.kalender-365.be/feestdagen/1991.html)


### Our first improvement

```{r}
gg = ggplot(births_conceptions_be, 
            aes(x=date, y = nbirths))+
  geom_point(aes(weekday=weekday), size=.1)+
  geom_smooth(se = FALSE)+
  scale_date_custom()+
  scale_y_continuous(limits = c(0, 500))+ 
  scale_color_discrete()+
  labs(
    x = 'Year',
    y = 'Number of births', 
    color = 'Weekday', 
    title = 'Daily births in Belgium'
  )
plotly::ggplotly(gg) %>% 
  style(hoverinfo = 'none', traces = 2)
```

*** 

Clearly there is something going on...

If you hover over the scatter cloud you quickly see an 
indication for the reason of such a difference.
Or... just plain reasoning about impacting factors 
should work perfectly as well
in this simple case :)

### The effect of weekdays on births

```{r}
gg = ggplot(births_conceptions_be, 
            aes(x=date, y = nbirths, color = weekday))+
  geom_point(size=.1)+
  geom_smooth(se = FALSE)+
  scale_date_custom()+
  scale_y_continuous(limits = c(0, 500))+ 
  scale_color_discrete()+
  labs(
    x = 'Year',
    y = 'Number of births', 
    color = 'Weekday', 
    title = 'Daily births in Belgium
^{Presence of a weekend effect}'
  )


plotly::ggplotly(gg) %>% 
  style(hoverinfo = 'none', traces =  8:14)
```

*** 

We could expect that births were lower in the weekend. 
However, that there is such a large difference might be surprising nonetheless.

---

```{r, eval=FALSE, echo=FALSE}
births_conceptions_be %>% 
  filter(!is.na(nbirths)) %>% 
  group_by(weekend = weekday %in% c('Sat', 'Sun')) %>% 
  summarise(avg_daily_births = mean(nbirths))
```

Over the years, the average number of births during a weekend day
is 215. For weekdays, it is 373. On average, 
one is 1.73 times more likely to be born during a weekday.*

---

*_Simplified example which does not take year, holidays etc. into account._


### An alternative representation: birthday heatmap

```{r}
hovertemplate  = "{yy}
{weekday} {dd} {month.abb[mm]}

No. births: {nbirths}
Birthday popularity: {round(bd_pop, 2)}{tooltip_holiday}"

heatmapdata = births_conceptions_be %>% 
  filter(!is.na(nbirths)) %>% 
  group_by(yy) %>% 
  mutate(
    bd_pop = nbirths / mean(nbirths), 
    tooltip_holiday = ifelse(is.na(holiday), '', paste0('\nHoliday: ', holiday, '')),
    tooltip = glue::glue(hovertemplate),
    Year = yy
  )

gg = ggplot(heatmapdata %>% filter(yy == 1992), 
       aes(fill=bd_pop, x = dd, y = mm, frame = Year, text=tooltip))+
  geom_tile()+
  scale_fill_gradient2(midpoint = 1, breaks = seq(.4, 1.4, .2))+
  scale_y_discrete(limits = rev)+
  labs(
    title = "Birthday popularity in 1992",
    x = 'Day', y = "Month", 
    fill = 'Birthday\npopularity')+
  theme(
    panel.background = element_blank(), 
    panel.grid = element_blank(),
    axis.ticks = element_blank()
  )


plotly::ggplotly(gg, tooltip = 'text') %>% 
  layout(
    hoverlabel=list(bgcolor="white")
  )
```

***


The tiles on the left represent a heatmap of birthday popularity 
by year. Purple tiles indicate more births on those days compared 
to the year average - the opposite holds for the red tiles.

---

Again, we can identify the weekend effect on this plot. 
The red tiles are mostly concentrated with two less common birthdays
next to each other. 




### The heatmap, in motion 


```{r}


gg = ggplot(heatmapdata, 
       aes(fill=bd_pop, x = dd, y = mm, frame = Year, text=tooltip))+
  geom_tile()+
  scale_fill_gradient2(midpoint = 1, breaks = seq(.4, 1.4, .2))+
  scale_x_discrete(position='top')+
  scale_y_discrete(limits = rev)+
  labs(x = 'Day', y = "Month", 
       fill = 'Birthday\npopularity')+
  theme(
    panel.background = element_blank(), 
    panel.grid = element_blank(),
    axis.ticks = element_blank()
  ) 

plotly::ggplotly(gg, tooltip = 'text') %>% 
  plotly::animation_slider(
    pad = list(t = 10)
  ) %>%
  layout(
    hoverlabel=list(bgcolor="white"),
    xaxis=list(side='top')
  ) %>%
  animation_button(
    visible = TRUE,
    pad = list(r = 20)
  )

```

***

Similar to the previous visual, we now have a sequence of heatmaps for consecutive years.
Playing the animation will also make clear that
the red blocks just shift horizontally as weekdays shift each year.

---

Hovering over the tiles will give you some details. If you're born
in 1992 or later, you can check how common your birthday is during that year.

---

In one of the next blocks, this visualization will help us to avoid
making false conclusions!

---

GIF available [here](file.gif)


### Patterns in births = patterns in conceptions. _When do people conceive babies?_  {data-commentary-width=500}

```{r}
# make data
spiral_data =
  births_conceptions_be %>% 
  filter(!is.na(nconceptions)) %>% 
  filter(!(mm == 2 & dd == 29)) %>%
  group_by(yy, mm) %>% 

  # daily avg per yymm
  mutate(
    yymm_mean = round(mean(nconceptions))
  ) %>% 
  group_by(yy) %>% 
  # comparison yymm daily average to yy daily average and create rank
  mutate(
    month_conc_popularity = 13 - dense_rank(yymm_mean),
    tooltip = glue::glue("{month.name[mm]} {yy}\n\nAvg daily conceptions: {yymm_mean}\nConception rank within year: {month_conc_popularity}"),
  ) %>% 
  ungroup() %>% 
  
  # helper variable for polar spiral
  mutate(
    day_of_year_cumsum = cumsum(day_of_year)
  )

# tickpos years; angular axis
tickvals_years = spiral_data %>% 
  filter(dd == 1, mm == 1, yy %in% c(1995, 2020)) %>% 
  pull(day_of_year_cumsum)


# tickpos months
tickvals_months = as.integer(seq(as.Date('1970-01-15'), as.Date('1970-12-15'), by = 'month'))

plot_ly(
    spiral_data, 
    colors = colorRamp(c('darkblue', 'white', 'darkred')),
    type = 'scatterpolar',
    mode = 'markers',
    theta = ~day_of_year, 
    r = ~day_of_year_cumsum, 
    text = ~tooltip, 
    color = ~factor(month_conc_popularity),
    hoverinfo = 'text',
    marker = list(size = 4)
  ) %>%
  layout(
    legend = list(title = list(text = 'Daily average\nof conceptions\n^{relative to month within year\n\n1: most fertile\n12: least fertile}\n')),
    polar = list(
      angularaxis  = list(
        showline = FALSE,
        showgrid = FALSE,
        rotation = 90,
        direction = "clockwise",
        tickmode="array",
        tickvals = tickvals_months,
        ticktext = month.abb,
        ticks = ''
      ),
      radialaxis = list(
        # angle = 90,
        # tickangle = 90, 
        showline = FALSE,
        showgrid = FALSE,
        ticks = '', 
        tickmode = 'array', 
        tickvals = tickvals_years, 
        ticktext = c("1995", "2020"), 
        tickfont = list(color = "black")
      )
    ),
    xaxis = list(
    'showgrid' = FALSE, # thin lines in the background
    'zeroline' = FALSE, # thick line at x=0
    'visible' = FALSE   # numbers below
    ),
    yaxis = list(
    'showgrid' = FALSE, # thin lines in the background
    'zeroline' = FALSE, # thick line at x=0
    'visible' = FALSE   # numbers below
    )
  )

```

***




---

*_The choice of a 'spiral plot' for the conceptions was non-intentional.
This visualization was somewhat inspired by this visualization
on [COVID](https://www.nytimes.com/2022/01/06/opinion/omicron-covid-us.html)_


---
Births in Belgium between 1992 - 2020; or how you should NOT visualize the data.

Our first improvement

The effect of weekdays on births

An alternative representation: birthday heatmap

The heatmap, in motion

Patterns in births = patterns in conceptions. When do people conceive babies?