How To Create a Pitching Spray Chart with RStudio

I will be honest: this example is a little “backward” because I am graphing every pitch faced by a batter over the course of a segmented season. However, the general coding remains the same: you would simply replace the batter’s name with the pitcher’s name you want to graph when you get your data.

With that in mind, this post is going to provide a step-by-step overview of how to create your own launch angle and exit speed graphs based on individual players.

1. Gathering Data from Baseball Savant

I know, I know – this step could be performed by using the fantastic BaseballR package. However, I tend to move faster in this step by using Baseball Savant, downloading the data, and then importing it into RStudio. I

t might be a bit old school, I guess, but it works for me.

So, in this specific example, I am taking a look at the Pittsburgh Pirates’ Josh Bell.

I am a big Pirates fan, unfortunately. And Bell is likely the next rising superstar that will be traded away for a no-name minor leaguer and a rotten grapefruit.

If you can’t tell, I’m one of those Pirates fans that harbors quite a bit of anger towards the Pirates’ ownership.

Anyways: in looking at Josh Bell’s information, I wanted to look at just the first 60 games of last season to mimic what this upcoming 2020 season is going to look like.

A bit of a detour in thought, but the 2020 baseball season is going to be wildly unpredictable. With well under half the normal games, there is little time – or, perhaps, none at all – for the expected regression to take place. Which I why I think it is interested to do some research on last season just through the first sixty games.

So, using Baseball Savant, I grab every single pitch that Josh Bell hit between March 28 and April 6 of last season (the 60-game mark for the Pirates). After downloading the spreadsheet, I input it into RStudio as my dataset.

2. Initial Coding for Create Strike Zone and Name Pitches

##Drawing The Strike Zone
x <- c(-.95,.95,.95,-.95,-.95)
z <- c(1.6,1.6,3.5,3.5,1.6)

#store in dataframe
sz <- data.frame(x,z)

##Changing Pitch Names
pitch_desc <- joshbell_hitting$pitch_type

##Changing Pitch Names
pitch_desc[which(pitch_desc=='CH')] <- "Changeup"
pitch_desc[which(pitch_desc=='CU')] <- "Curveball"
pitch_desc[which(pitch_desc=='FC')] <- "Cutter"
pitch_desc[which(pitch_desc=='FF')] <- "Four seam"
pitch_desc[which(pitch_desc=='FS')] <- "Split Flinger"
pitch_desc[which(pitch_desc=='FT')] <- "Two-Seam"
pitch_desc[which(pitch_desc=='KC')] <- "Kuckle-Curve"
pitch_desc[which(pitch_desc=='SI')] <- "Sinker"
pitch_desc[which(pitch_desc=='SL')] <- "Slider"

Let’s quickly talk about what is happening here.

First, you are creating an ‘x’ variable with those specific restrictions, as well as doing so for the variable ‘z’. Afterwards, you are simply combing both into one data frame. It may not make sense know, but you will understand once the plot is created.

Next, we change the variable ‘pitch type’ that was included in the Baseball Savant data to ‘pitch_desc.’

After, as you can see in the above code, you are changing the shorthand description of the pitch as provided by Baseball Savant into the long-hand version. Doing so makes the graph look a bit more professional.

3. Plotting the Data Using ggplot2

ggplot() +
##First plotting the strike zone that we created
  geom_path(data = sz, aes(x=x, y=z)) +
  coord_equal() +
##Now plotting the actual pitches
  geom_point(data = joshbell_hitting, aes(x = plate_x, y = plate_z, size = release_speed, color = pitch_desc)) +
  scale_size(range = c(-1.0,2.5))+
##Using the color package 'Viridis' here
  scale_color_viridis(discrete = TRUE, option = "C") +
  labs(size = "Speed",
       color = "Pitch Type",
       title = "Josh Bell - Pitch Chart",
       subtitle = "March 28 - April 6, 2019") +
  ylab("Feet Above Homeplate") +
  xlab("Feet From Homeplate") +
  theme(plot.title=element_text(face="bold",hjust=-.015,vjust=0,colour="#3C3C3C",size=20),
        plot.subtitle=element_text(face="plain", hjust= -.015, vjust= .09, colour="#3C3C3C", size = 12)) +
  theme(axis.text.x=element_text(vjust = .5, size=11,colour="#535353",face="bold")) +
  theme(axis.text.y=element_text(size=11,colour="#535353",face="bold")) +
  theme(axis.title.y=element_text(size=11,colour="#535353",face="bold",vjust=1.5)) +
  theme(axis.title.x=element_text(size=11,colour="#535353",face="bold",vjust=0)) +
  theme(panel.grid.major.y = element_line(color = "#bad2d4", size = .5)) +
  theme(panel.grid.major.x = element_line(color = "#bdd2d4", size = .5)) +
  theme(panel.background = element_rect(fill = "white")) 

From a coding standpoint, this is pretty straight forward stuff.

Once you ‘clean’ the data just a bit for presentation purposes, everything you need is already there. No need for complicated data wrangling or anything of that sort.

As you can see in the above ggplot coding, we are simply the ‘plate_x’ and ‘plate_z’ data provided by Baseball Savant and then mapping it against by size (release_speed) and color (pitch_desc).

The end result should look like this:

josh bell pitch graph

Final Thoughts

As you can see, we now have a graph that depicts every single pitch that Josh Bell faced in the first 60-game of last season. To make it even better, we could probably place the pitch speed into ranges (75-80, 81-85, etc.) simply to add a little more depth to that graph.

That said: the next step in this graph would be to change it to a spray chart from the batter’s perspective to see which of these pitches he hit and where exactly they went.

And, as I mentioned, if you wanted to do this from the pitcher’s perspective, simply download a pitcher’s data from Baseball Savant and do the exact coding as above. For example, here is Anthony DeSclafani’s pitching chart from the same period of above:

Obviously you can make more astute observations from this simply because it is a pitcher, as opposed to the above Josh Bell one.

For example, DeSclafani clearly had significant control issues of his four-seam fastball in the early parts of last season. If one wanted, a month-by-month plot could be created to see if that issue was ever corrected (just off the top of my head).

The following two tabs change content below.

Brad Congelio

An Assistant Professor in the College of Business at Kutztown University of Pennsylvania, Brad Congelio uses data science and analytics to investigate the sport industry.

Latest posts by Brad Congelio (see all)

Leave a Comment

Follow Me on Twitter

I am always talking about RStudio, data science, and sports analytics on Twitter - especially those subjects that aren't quite enough for blog posts on my site. Click below to follow me and join the conversation.