pitchRx: Tools for Collecting and Analyzing MLB PITCHf/x Data

Carson Sievert 8/6/2013

Follow along: http://cpsievert.github.io/slides/pitchRx/jsm

Outline

  1. What is PITCHf/x?
  1. Collecting PITCHf/x with pitchRx
  2. Visualizing PITCHf/x with pitchRx

Scraping PITCHf/x

  1. All PITCHf/x data is freely accessible here: http://gd2.mlb.com/components/game/mlb/
  2. Common methods for collecting PITCHf/x are laborious
  1. WE CAN DO BETTER!!!

Scraping with pitchRx

library(pitchRx)
dat <- scrapeFX(start="2008-01-01", 
                end="2013-01-01")
atbats <- dat$atbat
pitches <- dat$pitch

Advanced Scraping

dat <- scrapeFX(start="2008-01-01", 
                end="2013-01-01"
                tables = list(atbat = NULL, 
                              pitch = NULL,
                              coach = NULL, 
                              runner = NULL, 
                              umpire = NULL, 
                              player = NULL, 
                              game = NULL))

Strike-zone plots

  1. Strike-zone plots have height of the batter on the vertical axis and data points correspond to the location of baseballs as they cross home plate.
  2. pitchRx can easily produce two types of strikezone plots:
  1. Useful for answering questions such as: Do umpires favor home (as opposed to away) pitchers?

Some terminology

Probability of a Called Strike

pitchFX <- plyr::join(dat$pitch, dat$atbat, 
                by=c("num", "url"))
decisions <- subset(pitchFX, des %in% 
                    c("Called Strike", "Ball"))
decisions$strike <- as.numeric(decisions$des == 
                                 "Called Strike")
strikeFX(decisions, model=gam(strike~s(px)+s(pz), 
          family = binomial(link='logit')), 
          layer=facet_grid(.~stand))

Difference in probability of Called Strike

strikeFX(decisions, model=gam(strike~s(px)+s(pz), 
          family = binomial(link='logit')), 
          density1=list(top_inning="Y"), 
          density2=list(top_inning="N"), 
          layer=facet_grid(.~stand))

Home Field Advantage

Strike-zones vs Trajectories

Yu Vishnu Darvish - a case study

http://i.minus.com/i3SXAH4AAxtWS.gif

*Created by Drew Sheppard @DShep25

Get the data

dat <- scrapeFX(start="2013-04-24", 
                end="2013-04-24")
atbats <- subset(dat$atbat, 
                 pitcher_name == "Yu Darvish")
Darvish <- plyr::join(atbats, dat$pitch, 
                by=c("num", "url"), type="inner")

PITCHf/x animation

animateFX(Darvish, layer=list(theme_bw(),
                    coord_equal(),
                    facet_grid(.~stand)))

Real time animation

Whoa, nelly!!!

Normalized PITCHf/x

animateFX(Darvish, avg.by="pitch_types", 
          layer=list(coord_equal(),
          theme_bw(),
          facet_grid(.~stand)))

Normalized animation

WebGL Graphics

RH <- subset(Darvish, stand=="R")
interactiveFX(RH, avg.by="pitch_types")

Want more??

  1. Visit the pitchRx demo page (now included with CRAN package as R Markdown vignette).
  2. R Journal article coming soon!
  3. My web app.
  1. Contribute to development or post an issue on GitHub.
  2. I occasionally blog and tweet @cpsievert about pitchRx.

Special Thanks to:

This project wouldn’t be possible without the help of these people/organizations. Thank you for your help and/or great work!!!

Thanks for listening!