Freya Systems

By Freya Systems

R Markdown is a fantastic tool for R users seeking to combine data visualization and analysis in a single reproducible deliverable. However, there are instances in which the volume of figures required clogs up the report, which acts as a nuisance at best and mitigates the impact of the report at worst.

Tabsets solve this problem by seamlessly grouping similar figures and tables into an interactive single instance, which diminishes the amount of clutter in the report without reducing the insights included therein.

Using the Lahman R package to compare statistics across the Major League Baseball (MLB) careers of Mark McGwire and Sammy Sosa, this post will illustrate the problem and then show how to use tabsets to solve it.

The Problem: Lack of Data Consolidation

Suppose you have a group of figures that you would like to include in your report, and even though they are related, they cannot be combined. Such a collection of figures can take up a lot of real estate in your report, which is often not ideal.

To demonstrate the problem at hand, the three number line graphs below compare the number of home runs, runs batted in, and hits by Mark McGwire and Sammy Sosa across their MLB careers. Notice how much space multiple comparison graphs consume.

The Solution: Using Tabsets as a Data Center for Consolidation

Tabsets offer a much more compact way of including these figures in a report, allowing the user to consolidate data from multiple worksheets in a single worksheet. Notice below how by clicking through the tabs along the top, all three figures can be viewed while only taking up the space of a single figure. Thus, tabsets can clean up the data visualization throughout a report without reducing the number of figures included.

Comparing the MLB Careers of Mark McGwire and Sammy Sosa

Home Runs

Runs Batted In

Hits

The Code

We begin by doing some light data processing on data from the Lahman R package.

library(Lahman)
library(data.table)
library(ggplot2)

# create batting data.table with player names
df <- data.table(merge(People[, c('playerID', 'nameFirst', 'nameLast')], Batting, by = 'playerID'))
df[, Name := paste(nameFirst, nameLast)]

# subset on select players
players <- c('Sammy Sosa', 'Mark McGwire')

# aggregate data by year
df <- df[Name %in% players, .(HR = sum(HR), RBI = sum(RBI), H = sum(H)), by = .(Year = yearID, Name)]

To initialize a tabset, we add {.tabset} to the end of the parent R Markdown heading. In this case, our parent heading is Comparing the MLB Careers of Mark McGwire and Sammy Sosa, and it is a level three heading (meaning it has three preceding number signs, or ###). Thus, we create that heading in R Markdown as follows:

### Comparing the MLB Careers of Mark McGwire and Sammy Sosa {.tabset}

Then, we simply use nested headings to create the tabs of the tabset. In this instance, each subsequent level four heading, coupled with the output of the code therein, hosts the content of each tab.

The tabset below includes the code for each figure for reproducibility.

Comparing the MLB Careers of Mark McGwire and Sammy Sosa

Home Runs

ggplot(df, aes(x = Year, y = HR, color = Name)) + 
  geom_point() + 
  geom_line() + 
  labs(x = '\nYear', 
       y = 'Home Runs\n', 
       color = NULL) + 
  theme_bw() + 
  theme(legend.position = 'bottom')

Runs Batted In

ggplot(df, aes(x = Year, y = RBI, color = Name)) + 
  geom_point() + 
  geom_line() + 
  labs(x = '\nYear', 
       y = 'Runs Batted In (RBI)\n', 
       color = NULL) + 
  theme_bw() + 
  theme(legend.position = 'bottom')

Hits

ggplot(df, aes(x = Year, y = H, color = Name)) + 
  geom_point() + 
  geom_line() + 
  labs(x = '\nYear', 
       y = 'Hits\n', 
       color = NULL) + 
  theme_bw() + 
  theme(legend.position = 'bottom')

Finally, to bring it all together, the screenshot below shows both the code and the R Markdown headings used to create the tabset.