By Freya Systems
What is Quarto?
Quarto is a new open source scientific and technical publishing system created by Posit (formerly RStudio). Quarto is marketed as a next generation version of R Markdown that includes new features and enables collaboration across multiple programming languages.
With Quarto, users of R, Python, Julia, and Javascript can combine data from multiple worksheets, done in each language, into a single document. Quarto also works with many popular IDEs and text editors, like RStudio, Jupyter Notebook, and VS Code. Finally, Quarto allows users to publish reports, presentations, websites, blogs, books, and journal articles in a variety of formats (HTML, PDF, ePub, MS Word, etc.).
This blog post, created in Quarto, shows how to add tabsets to a Quarto document using R. It also closely follows our prior blog post on how to create tabsets in R Markdown.
The Goal: Combine Data Reports Without Reducing Data Insights
Quarto is a great tool for R users seeking to combine data visualization and analysis in a single reproducible deliverable. However, there are instances in which the volume of required figures clogs up the report, which acts as a nuisance at best and mitigates the impact of the report at worst.
Tabsets solve this problem by seamlessly grouping like figures and tables into a single interactive instance, which diminishes the amount of clutter in the report without reducing the data driven insights included therein.
Using the Lahman
R package to compare statistics across the Major League Baseball (MLB) careers of Mark McGwire and Sammy Sosa, this post will illustrate the problem and then show how to use tabsets to solve it.
The Problem: Users Cannot Combine Data from Multiple Worksheets
Suppose you have a group of figures that you would like to include in your report, and even though they are related, they cannot be combined. Such a collection of figures can take up a lot of real estate in your report, which is often not ideal.
To demonstrate the problem at hand, the three figures below compare the number of home runs, runs batted in, and hits by Mark McGwire and Sammy Sosa across their MLB careers. Notice how much space these three figures consume.
The Solution: Use a Tabset for Visual Data Clean Up
Tabsets offer a much more compact way of including these figures in a report. Notice below how by clicking through the tabs along the top, all three figures can be viewed while only taking up the space of a single figure. Thus, tabsets can clean up the data visualization throughout a report without reducing the number of figures included.
Comparing the MLB Careers of Mark McGwire and Sammy Sosa with a data visualization dashboard
- Home Runs
- Runs Batted In
- Hits
The Code: Process Data From R Packages
We begin by doing some light data processing on data from the Lahman
R package.
library(Lahman)
library(data.table)
library(ggplot2)
# create batting data.table with player names
df <- data.table(merge(People[, c('playerID', 'nameFirst', 'nameLast')], Batting, by = 'playerID'))
df[, Name := paste(nameFirst, nameLast)]
# subset on select players
players <- c('Sammy Sosa', 'Mark McGwire')
# aggregate data by year
df <- df[Name %in% players, .(HR = sum(HR), RBI = sum(RBI), H = sum(H)), by = .(Year = yearID, Name)]
To initialize a tabset, we add ::: panel-tabset
directly above the markdown heading that denotes the first figure of the tabset. In this case, our first heading is Home Runs, and it is a level four heading (meaning it has four preceding number signs, or ####
). Thus, we initialize the tabset as follows:
Then, we simply use level four headings to create the remaining tabs of the tabset. In this instance, each subsequent level four heading, coupled with the output of the code therein, hosts the content of each tab. Finally, to end the tabset, we place :::
directly after the code of the final tab.
The tabset below includes the code for each figure for reproducibility.
Comparing the MLB Careers of Mark McGwire and Sammy Sosa with code
- Home Runs
- Runs Batted In
- Hits
ggplot(df, aes(x = Year, y = HR, color = Name)) +
geom_point() +
geom_line() +
labs(x = '\nYear',
y = 'Home Runs\n',
color = NULL) +
theme_bw() +
theme(legend.position = 'bottom')
ggplot(df, aes(x = Year, y = RBI, color = Name)) +
geom_point() +
geom_line() +
labs(x = '\nYear',
y = 'Runs Batted In (RBI)\n',
color = NULL) +
theme_bw() +
theme(legend.position = 'bottom')
ggplot(df, aes(x = Year, y = H, color = Name)) +
geom_point() +
geom_line() +
labs(x = '\nYear',
y = 'Hits\n',
color = NULL) +
theme_bw() +
theme(legend.position = 'bottom')
Finally, to bring it all together, the screenshot below shows the code for the entire tabset.