Nate Latshaw

By Nate Latshaw

The Goal

R Markdown is a fantastic tool for R users seeking to combine data visualization and analysis in a single reproducible deliverable. However, there are instances in which the volume of required figures clogs up the report, which acts as a nuisance at best and mitigates the impact of the report at worst.

Tabsets solve this problem by seamlessly grouping like figures and tables into a single interactive instance, which diminishes the amount of clutter in the report without reducing the insights included therein.

Using the Lahman R package to compare statistics across the Major League Baseball (MLB) careers of Mark McGwire and Sammy Sosa, this post will illustrate the problem and then show how to use tabsets to solve it.

The Problem

Suppose you have a group of figures that you would like to include in your report, and even though they are related, they cannot be combined. Such a collection of figures can take up a lot of real estate in your report, which is often not ideal.

To demonstrate the problem at hand, the three figures below compare the number of home runs, runs batted in, and hits by Mark McGwire and Sammy Sosa across their MLB careers. Notice how much space these three figures consume.

The Solution

Tabsets offer a much more compact way of including these figures in a report. Notice below how by clicking through the tabs along the top, all three figures can be viewed while only taking up the space of a single figure. Thus, tabsets can clean up the data visualization throughout a report without reducing the number of figures included.

Comparing the MLB Careers of Mark McGwire and Sammy Sosa

The Code

We begin by doing some light data processing on data from the Lahman R package.

library(Lahman)
library(data.table)
library(ggplot2)

# create batting data.table with player names
df <- data.table(merge(People[, c('playerID', 'nameFirst', 'nameLast')], Batting, by = 'playerID'))
df[, Name := paste(nameFirst, nameLast)]

# subset on select players
players <- c('Sammy Sosa', 'Mark McGwire')

# aggregate data by year
df <- df[Name %in% players, .(HR = sum(HR), RBI = sum(RBI), H = sum(H)), by = .(Year = yearID, Name)]

To initialize a tabset, we add {.tabset} to the end of the parent R Markdown heading. In this case, our parent heading is Comparing the MLB Careers of Mark McGwire and Sammy Sosa, and it is a level three heading (meaning it has three preceding number signs, or ###). Thus, we create that heading in R Markdown as follows:

### Comparing the MLB Careers of Mark McGwire and Sammy Sosa {.tabset}

Then, we simply use nested headings to create the tabs of the tabset. In this instance, each subsequent level four heading, coupled with the output of the code therein, hosts the content of each tab.

The tabset below includes the code for each figure for reproducibility.

Comparing the MLB Careers of Mark McGwire and Sammy Sosa

ggplot(df, aes(x = Year, y = HR, color = Name)) + 
  geom_point() + 
  geom_line() + 
  labs(x = '\nYear', 
       y = 'Home Runs\n', 
       color = NULL) + 
  theme_bw() + 
  theme(legend.position = 'bottom')

Finally, to bring it all together, the screenshot below shows both the code and the R Markdown headings used to create the tabset.