pitchRx: Tools for Collecting and Analyzing Major League Baseball’s PITCHf/x Data

Carson Sievert

3/29/2013

Biography

Outline

  1. What is PITCHf/x?
  1. PITCHf/x data format & structure

Outline continued

  1. Visualizing PITCHf/x

Problem Statement

  1. Existing methods for collecting PITCHf/x require running Perl scripts and other Web stack technologies. This presents hurdles that prevent many people from obtaining the data. Furthermore, these scripts are very hard to customize or extend.

  2. There is no automated process for creating popular PITCHf/x visuals. Furthermore, most are restricted to static 2D plots of PITCHf/x data, even though this data can be used to create three-dimensional flight paths dependent upon time.

The Solution: pitchRx

  1. Easily collect PITCHf/x and related information from the web source.
  1. Provides an automated process for producing strikezone plots (bivariate scatterplot densities), 2D animation of pitch locations over time, and 3D interactive graphics.

PITCHf/x parameters

  1. x(t) = x0 + vx0 * t + ax * t2
  2. y(t) = y0 + vy0 * t + ay * t2
  3. z(t) = z0 + vz0 * t + az * t2

PITCHf/x data format

XML Hierarchy

Data Issues & Solutions

  1. Information across different tags are inconsistent.
  1. Players are identified by ID.
  1. Source data doesn’t explicitly record things like the pitch count.

Scraping made easy

library(pitchRx)
data <- scrapeFX(start="2011-01-01",
                end="2011-12-31",
            tables=list(atbat=NULL, pitch=NULL))

Mariano Rivera and Phil Hughes fastballs from 2011.

atbats <- subset(data$atbat, pitcher_name %in% 
                c("Mariano Rivera", "Phil Hughes"))
pitchFX <- join(atbats, data$pitch, 
                by=c("num", "url"), type="inner")
pitches <- subset(pitchFX, pitch_type %in% 
                  c("FF", "FC"))

Animation and batter stance

animateFX(pitches, layer=list(theme_bw(),
                    coord_equal(),
                    facet_grid(.~stand, 
                      labeller = label_both)))

pitches by stance (real time)

WebGL Graphics

Rivera <- subset(pitches, pitcher_name==
                   "Mariano Rivera")
interactiveFX(Rivera)

http://cpsievert.github.com/pitchRx/rgl1

Shiny Demo

library(shiny)
runGitHub('pitchRx', 'cpsievert', 
          subdir='inst/shiny')

http://glimmer.rstudio.com/cpsievert/pitchRx

Biased umpires?

Every called strike!

Home vs Away Called Strikes

Home vs Away Balls

My Conributions

  1. R package pitchRx:
  1. pitchRx demo page

  2. Web application (on top of pitchRx) that helps engage the code illiterate

  3. Paper currently under review for the R Journal

Special Thanks to:

This project wouldn’t be possible without the help of these people/organizations. Thank you for your help and/or great work!!!

Questions???