pitchRx
pitchRx
pitchRx
library(pitchRx)
dat <- scrapeFX(start="2008-01-01",
end="2013-01-01")
atbats <- dat$atbat
pitches <- dat$pitch
dat <- scrapeFX(start="2008-01-01",
end="2013-01-01"
tables = list(atbat = NULL,
pitch = NULL,
coach = NULL,
runner = NULL,
umpire = NULL,
player = NULL,
game = NULL))
urlsToDataFrame
can be used to manipulate any collection of XML files into a list of data frames.pitchRx
can easily produce two types of strikezone plots:Do umpires favor home (as opposed to away) pitchers?
Given the umpire has to make a decision, do home pitchers have a higher chance of receiving a called strike?"
A called strike is a case where the batter does not swing and the umpire declares the pitch a strike (which is a favorable outcome for the pitcher).
A ball is an instance where the batter doesn’t swing and the umpire declares the pitch a ball (which is a favorable outcome for the batter).
By restricting ourselves to these two outcomes, we condition upon a situation where the umpire has to make a binary decision about the pitch.
gam
from the mgcv
package to visualize the probability of a called strike (given the ump has to make a decision).pitchFX <- plyr::join(dat$pitch, dat$atbat,
by=c("num", "url"))
decisions <- subset(pitchFX, des %in%
c("Called Strike", "Ball"))
decisions$strike <- as.numeric(decisions$des ==
"Called Strike")
strikeFX(decisions, model=gam(strike~s(px)+s(pz),
family = binomial(link='logit')),
layer=facet_grid(.~stand))
We can also visualize the difference in probabilistic events by adding arguments to density1
and density2
.
Here we find the probability of a called strike during the top inning minus the probability of a called strike during the bottom inning (top inning == home pitcher).
strikeFX(decisions, model=gam(strike~s(px)+s(pz),
family = binomial(link='logit')),
density1=list(top_inning="Y"),
density2=list(top_inning="N"),
layer=facet_grid(.~stand))
strikeFX
is nice for visualizing a lot of data (we just visualized over 1.5 million pitches).
PITCHf/x can also be used to regenerate (approximate) pitch trajectories.
It isn’t straightforward to animate millions of pitch trajectories, so we usually restrict our focus to a few cases.
VishnuDarvish - a case study
*Created by Drew Sheppard @DShep25
dat <- scrapeFX(start="2013-04-24",
end="2013-04-24")
atbats <- subset(dat$atbat,
pitcher_name == "Yu Darvish")
Darvish <- plyr::join(atbats, dat$pitch,
by=c("num", "url"), type="inner")
Darvish
contains info on every pitch thrown by Yu Darvish on April 24th, 2013.animateFX
can be used in a similar fashion to strikeFX
for producing a series of plots that track pitch locations over time.
As the animateFX
animations progress, the pitches are being thrown directly towards you.
animateFX(Darvish, layer=list(theme_bw(),
coord_equal(),
facet_grid(.~stand)))
Real time animations are hard to digest!
Plotting that many pitches makes it even worse…
NormalizedPITCHf/x
typicalflight path)
animateFX(Darvish, avg.by="pitch_types",
layer=list(coord_equal(),
theme_bw(),
facet_grid(.~stand)))
Normalizedanimation
RH <- subset(Darvish, stand=="R")
interactiveFX(RH, avg.by="pitch_types")
strikeFX
and animateFX
pitchRx
.