library(tidyverse)
library(showtext)
library(ggbeeswarm)
library(ggbump)
library(ggtext)
library(gghighlight)
library(ggparliament)
The Tour de France is the world’s most prestigious and grueling road cycling race. It is held annually in the summer spanning 21 stages over a period of 23 or 24 days. I have been a fan of the tour and road cycling since I watched the inspiring Netflix documentary series of Tour de France: Unchained. This blog post presents a personal data visualisation project on the 2025 edition of the Tour using R programming language.
Pre-requisite
All data in this blog post was personally collated from the official race website. You may access the data files via my github.
Libraries
Data processing and visualisation are primarily performed using the dplyr
and ggplot2
packages, respectively, which are included in the tidyverse
package collection. Some visualisations require additional extension packages. See below for full list of packages used.
Theme customisation
The code below details and set the customisation of theme applied to all the visualisations.
Code
# Fonts
font_add_google("Roboto Condensed")
font_add(family = "fb", regular = "Font Awesome 5 Brands-Regular-400.otf")
showtext_auto()
# Define personalised caption
<- "Data Visualisation: <span style='font-family:fb;'> </span>Nien Xiang Tou | Nienxiangtou.com | <span style='font-family:fb;'> </span>Nxrunning "
plot_caption
# Set theme
theme_set(theme_minimal(base_family = "Roboto Condensed"))
theme_update(plot.background = element_rect(fill = "#FFFFFF", color = "#FFFFFF"),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank(),
plot.title = element_textbox_simple(
color = "#F8C000", face="bold",
family = "Roboto Condensed", size = 50),
plot.subtitle = element_textbox_simple(
color = "#000000", size = 20,
family = "Roboto Condensed",
lineheight = 0.5,
margin = margin(t = 2, b = 2,
unit = "mm")),
plot.caption = element_textbox_simple(
color = "#000000", size = 20,
lineheight = 0.5, halign = 0.5,
margin = margin(t = 2, b = 0.5,
unit = "mm")),
axis.title = element_text(size = 22))
Route
The grand tour typically consists a mix of flat sprints, hilly terrain, and high mountain climbs. Additionally, riders also tackle individual time trials. In this year’s race, participants cycled over 76 hours across 21 stages, covering a total distance over 3,300 kilometres. I visualised the distance of each stage using lollipop charts with the colour aesthetic to differentiate the types of race.
Code
# Load data
<- read.csv("TDF25_Route.csv")
route
# Visualisation
ggplot(data = route, aes(x = Stage, y = Distance))+
geom_segment(aes(x=Stage, xend=Stage, y=0, yend=Distance))+
geom_point(aes(colour = Type), size = 3)+
annotate('curve', x = 18.5, xend = 18,
y = 210, yend = 190,
arrow = arrow(length = unit(0.03, "npc")),
curvature = 0.2)+
annotate('text', x = 20.2, y = 215,
label = "Queen stage", size = 7)+
scale_color_manual(values = c("#F8C000", "#E89800", "#986838", "#686868"))+
labs(title = "How far was Tour de France 2025?",
subtitle = "The Tour de France 2025 route covers over **3300 kilometres** across 21 stages, including <br><span style='color:#F8C000'>**7 flat stages**</span>, <span style='color:#E89800'>**6 hilly stages**</span>, <span style='color:#986838'>**6 mountain stages**</span>, and <span style='color:#686868'>**2 time trials**</span>. The hardest stage, also<br>known as the **Queen stage**, was stage 18 with 171.5 kilometres with 5450 metres of elevation.",
caption = plot_caption,
y = "Distance (Kilometres)")+
scale_x_continuous(breaks=c(1,5,10,15,20))+
theme(legend.position = "none",
axis.text = element_text(size = 15))
These athletes did not just cover long distances but also significant elevation gains. In this year’s tour, riders climbed a total of 52,500 metres, which is nearly equivalent to scaling Mount Everest six times. I visualised the elevation gain of each stage using a simple bar chart, and leveraging the gghighlight
package to draw readers’ attention to the Queen stage in Stage 18, where riders tackled the most demanding climb.
Code
# Create a dataframe for plot labels
<- tibble(
p2_label_df label = c(
"<span style='font-size:60pt; color:#F8C000'>**UPHILL BATTLES**</span>",
"Across the entire Tour de France, riders tackled a total of **52,500 metres** of elevation gain.<br>This is roughly equivalent of climbing Mount Everest nearly six times. Averaging 2,500 metres of<br>ascent per stage, the route demanded not just speed and strategy, but relentless climbing strength.",
"<span style='font-size:14pt'>Average elevation: 2,500 metres</span>",
"<span style='font-size:12pt'>The most brutal climb <br>was 5,450 metres in<br>Stage 18 Queen stage.</span>"
),x = c(0,0,0,18.5),
y = c(5000, 4500, 2500, 5000),
hjust = c(0,0,0,0),
vjust = c(0,0,0,0)
)
# Visualisation
ggplot(data = route, aes(x = Stage, y = Elevation))+
geom_hline(yintercept = 2500, linetype = "dotted")+
geom_bar(stat="identity", fill = "#F8C000")+
gghighlight(Stage == 18)+
geom_richtext(data = p2_label_df,
aes(x,y,label = label, hjust = hjust,
vjust = vjust),
fill = NA, label.color = NA,
lineheight = 0.5)+
xlim(0,23)+
labs(y = "Elevation (Metres)",
caption = plot_caption)+
scale_x_continuous(breaks=c(1,5,10,15,20))+
theme(panel.grid.major.y = element_blank(),
axis.text = element_text(size = 15))
Stage 13 Time Trial
This year’s Tour introduced a unique twist with a mountain time trial on Stage 13. Although the distance was relatively short at 10.9 km, the route featured 650 metres of elevation gain, with most sections exceeding a 9% gradient, posing a serious challenge for the sprinters. As expected, the stage was won by the yellow jersey holder, Tadej Pogačar. His performance highlighted a stark gap between himself and the rest of the field. He finished over eight minutes ahead of the last rider, with three-quarters of the peloton recording times more than 20% slower than his. This stage posed a real threat of elimination outside time limits, prompting the race jury to extend the limit from 33% to 40% at the last minute. Without this adjustment, seven riders including Biniam Girmay and Tim Merlier would have been eliminated from the tour. As there are many duplicate points, I visualised this stage performance using a beeswarm plot with the ggbeeswarm
package.
Code
<- read.csv("TDF_Stage13.csv")
stage13
<- stage13|>
stage13mutate(time_lub = as.numeric(as.period(ms(Times))),
grp_ref = "group")
# Create a dataframe for plot labels
<- tibble(
p3_label_df label = c(
"<span style='font-size:50pt; color:#F8C000'>**Surviving the Mountain Time Trial**</span>",
"The commanding victory of Tadej Pogačar in the Stage 13 mountain time trial<br>highlighted the gulf between him and the rest of the field. He finished nearly 8.5<br>minutes ahead of the last rider, and the initial 33% time cut would have<br>eliminated 7 cyclists. Over 75% of the peloton trailed by more than 20% of his<br>finishing time."),
x = c(1.5, 1.4),
y = c(22,22),
hjust = c(0,0),
vjust = c(0,0.5)
)
# Data visualisation
ggplot(data = stage13, aes(y = time_lub/60,
x = grp_ref))+
geom_quasirandom(cex = 2, colour = "#F8C000")+
coord_flip()+
geom_richtext(data = p3_label_df,
aes(x = x, y = y, label = label,
hjust = hjust, vjust = vjust),
fill = NA, label.color = NA,
lineheight = 0.5)+
annotate('curve', y = 23.3, yend = 23.1,
x = 1.15, xend = 1.05,
arrow = arrow(length = unit(0.03, "npc")),
curvature = 0.2)+
annotate('text', y = 23.35, x = 1.2,
label = "Tadej Pogačar 23:00",
size = 5)+
annotate('curve', y = 23.4, yend = 23.6,
x = 0.85, xend = 0.95,
arrow = arrow(length = unit(0.03, "npc")),
curvature = 0.2)+
annotate('text', y = 23.45, x = 0.8,
label = "Jonas Vingegaard 23:36",
size = 5)+
annotate('curve', y = 31.55, yend = 31.45,
x = 1.15, xend = 1.05,
arrow = arrow(length = unit(0.03, "npc")),
curvature = 0.2)+
annotate('text', y = 31.5, x = 1.2,
label = "Luka Mezgec 31:28",
hjust = 0, size = 5)+
annotate('segment', x = 1.4, xend = 0.6,
y = 30.59, yend = 30.59, colour = "#808080")+
annotate('text', y = 30.59, x = 1.45,
label = "33%", hjust = 0.5, size = 5)+
annotate('segment', x = 1.4, xend = 0.6,
y = 27.6, yend = 27.6, colour = "#808080")+
annotate('text', y = 27.6, x = 1.45,
label = "20%", hjust = 0.5, size = 5)+
labs(x = "",
y = "Time (minutes)",
caption = plot_caption)+
scale_y_continuous(breaks=c(23,25,27,29,31),
limits = c(22,33))+
theme(panel.grid.major.y = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text(size = 15))
Withdrawals
In this punishing endurance sport, it is common for riders to abandon the tour due to a myriad of reasons. Unfortunately, this year’s tour had several big name withdrawals. Green jersey contender, Jasper Philipsen, suffered a bad crash in Stage 3. Olympic champion, Remco Evenepoel, abandoned the tour on Stage 14 after consecutive bad days. World Champion, Mathieu Van Der Poel, did not start Stage 16 after a pneumonia diagnosis. Here’s a fun use of the ggparliament
package to visualise the composition of riders who completed the tour, and using the colour aesthetic to differentiate those who left the tour in week 1, 2 or 3.
Code
# Data
<- data.frame(
withdrawals party_name = factor(c("Complete", "Early_withdrawal",
"Mid_withdrawal",
"Late_withdrawal"),
levels = c("Complete",
"Early_withdrawal",
"Mid_withdrawal",
"Late_withdrawal")),
seats = c(160, 7, 10, 7)
)
# Data processing
<- parliament_data(
withdrawalselection_data = withdrawals,
parl_rows = 5,
type = 'semicircle',
party_seats = withdrawals$seats)
# Annotation labels
= "The Surviving Peloton"
p4_title = "Number of riders who <span style='color:#F8C000'>**completed the whole tour**</span>, and withdrawals in <span style='color:#E89800'>**stages 1-7**</span>, <span style='color:#986838'>**stages 8-14**</span>, and <span style='color:#686868'>**stages 15-21**</span>."
p4_subtitle
# Data visualisation
ggplot(withdrawals, aes(x, y, colour = party_name)) +
geom_parliament_seats(size = 3)+
annotate('curve', x = 2.1, xend = 2.0,
y = 0.85, yend = 0.65,
arrow = arrow(length = unit(0.03, "npc")),
curvature = -0.2)+
annotate('text', x = 2.1, y = 0.9,
label = "Jasper Philipsen",
size = 5, hjust = 0)+
annotate('curve', x = 2.3, xend = 2.1,
y = 0.3, yend = 0.4,
arrow = arrow(length = unit(0.03, "npc")),
curvature = 0.2)+
annotate('text', x = 2.3, y = 0.2,
label = "Remco Evenepoel",
size = 5, hjust = 0)+
annotate('text', x = 0, y = 0.2, label = "160 Riders",
size = 15, hjust = 0.5, fontface = "bold",
colour = "#000000")+
annotate('text', x = 0, y = 0.05,
label = "completed the tour",
size = 8, hjust = 0.5, fontface = "bold",
colour = "#F8C000")+
scale_color_manual(
values = c("#F8C000","#E89800","#986838","#686868"))+
xlim(-2, 2.8)+
labs(title = p4_title,
subtitle = p4_subtitle,
caption = plot_caption)+
theme(panel.grid.major.y = element_blank(),
axis.ticks = element_blank(),
axis.title = element_blank(),
axis.text = element_blank(),
legend.position = "none")
Team Rankings
Lastly, a bump chart was created using ggbump
package to illustrate the change in team rankings across the 21 stages of the tour. In the Tour de France, the team classification is determined by adding up the times of each team’s three best riders on each stage, and the team with the lowest cumulative time wins. Although the rankings may not be the most meaningful in the team classification, it presents information on how well teams performed in the general classification collectively. Also, I utilised the gghighlight
package again to highlight teams that had a good tour this time (according to grades on Cyclist).
Code
<- read.csv("TDF2025_Rankings.csv")
rankings
# Data Processing
<-rankings|>
rankingspivot_longer(cols = starts_with("Stage"), values_to = "Ranking", names_to = "Stage_name")|>
separate(Stage_name, into = c("Stage_prefix", "Stages"))|>
select(-Stage_prefix)
$Stages <- as.integer(rankings$Stages)
rankings$Ranking <- as.integer(rankings$Ranking)
rankings$Highlight <- (rankings$Stages==1) | (rankings$Stages==10) | (rankings$Stages==21)
rankings
<- c("LTK", "TPP", "UAD", "UXM")
highlighted_teams
# Annotation labels
<- "Team Rankings in Motion"
p5_title <- "General rankings of all teams across 21 stages and standout performing teams at Tour De France 2025."
p5_subtitle
# Data visualisation
ggplot(rankings, aes(x = Stages, y = Ranking,
group = Team_Code))+
geom_bump(linewidth = 0.8, colour = "#B9B9B9",
smooth = 6)+
geom_bump(data = ~.|>filter(Team_Code %in% highlighted_teams),
linewidth = 0.8, smooth = 6,
colour = "#F8C000")+
geom_point(data = subset(rankings, Highlight),
aes(x = Stages, y = Ranking),
color = "#505050", size = 2)+
geom_point(data = subset(rankings, Highlight),
aes(x = Stages, y = Ranking),
color = "#B9B9B9", size = 1)+
geom_point(data = subset(rankings, Highlight)|>filter(Team_Code %in% highlighted_teams),
aes(x = Stages, y = Ranking),
size = 1, colour = "#F8C000")+
geom_text(data = rankings|>filter(Stages == 21)|>filter(Team_Code %in% highlighted_teams == FALSE),
aes(x = 22, label = Team),
hjust = 0, color = "#B9B9B9",
size = 6)+
geom_text(data = rankings|>filter(Stages == 21)|>filter(Team_Code %in% highlighted_teams == TRUE),
aes(x = 22, label = Team),
hjust = 0, color = "#000000",
size = 6, fontface = "bold")+
labs(title = p5_title,
subtitle = p5_subtitle,
caption = plot_caption)+
scale_y_reverse(breaks = c(1,5,10,15,20))+
scale_x_continuous(limits = c(1, 27),
breaks = c(1,5,10,15,21))+
theme(panel.grid.major.y = element_blank(),
axis.text = element_text(size = 15)
)