Introducing SIERA

Postby The Crimson Cyclone » Fri Feb 12, 2010 12:19:37

The Nightman Cometh wrote:
The Crimson Cyclone wrote:
tangotiger wrote:
By the way, what's the general sabermetric view around here?


I can understand most of it, I just can't apply it or argue with it

so you don't understand it


I guess that's true :oops:

The Crimson Cyclone
Dropped Anchor
Dropped Anchor
 
Posts: 9372
Joined: Tue Oct 06, 2009 07:48:14

Postby TheBrig » Fri Feb 12, 2010 12:23:26

joe table wrote:From what I understand now about SIERA, it seems like the biggest objection people are going to have is the "GB*BB" term, and this will continue to be controversial until a lot more games/additional PBP data can be added to the analysis to either vindicate the creators' thinking or suggest more strongly that the term is insignificant


The intuition makes sense to me. Walks lead to runs, but they don't lead to as many runs for groundball pitchers because they get more GIDPs. Presuming the effect in the data is significant, I don't see the controversy.
5 rounds rapid!

TheBrig
There's Our Old Friend
There's Our Old Friend
 
Posts: 130
Joined: Sun Sep 27, 2009 19:33:36
Location: HQ

Postby phorever » Fri Feb 12, 2010 12:28:04

tangotiger wrote:
By the way, what's the general sabermetric view around here?


for the general view, see the answers from vox.

i'm one of those who can make my way through almost anything mathematical. (half a lifetime of physics will do that for ya)
however, i'm in the baseball part of the internet for fun, and math feels too much like work, so while i visit your blog pretty regularly and check out other decent sabr sites, i just skim over the math. and i don't and won't run any sabr analysis myself. would kill the fun. just the way i am.
arguing baseball is fun. math is fun. using math to argue baseball not fun.

markov won't work consistently well if the steps of individual parameters aren't independent, and their are too many possibly coupled event variables in baseball for monte-carlo / baysian methods to be able to determine the degree of their independence. they might be independent. and you might have enough experience playing with the numbers to have correctly intuited their independence. but from my perspective, if no reasonable explanation of matt's quadratic terms jive with the markov results, i want to see high-performance computing, "smart" bayesian analysis, fuzzy logic thrown at a more generalized version of the problem to figure out what's going on.

quick (and relevant) example of the potential interdependence. is there any way of knowing how much a 6th inning k instead of a 1st-pitch fielder's choice grounder to 3rd with one out changes the energy level of a pitcher and his related ability to throw pitches with good downward motion, as well as his ability to keep his manager from calling on a reliever with worse gb abilities, as well as changing the number of pitches seen by the hitter and thus his likelihood of guessing right in the 9th if the pitcher is going for a complete game?

potentially, that k changes not only the situation on the bases and thus the impact of each subsequent event, but also the odds of the occurance of each one of those subsequent events and thus the probability density functions from which the markov events are supposed to be drawn. and that change is different than the change created by a 1st-pitcher grounder that has the same effect on the base-out-score situation. it will take a lot to convince me that all of those odds changes can either be pulled out of the existing data or proven to be insignificant without high-powered computing.

ps: unmodeled couplings will do even worse things to linearized regressions than to the markov results, so i don't trust matt's results either, though the 2009 prediction test results are promising. if you're lucky, coefficients from underspecified regressions on nonlinear systems can be used for accurate predictions even when they incorrectly decribe the underlying process.
phorever
There's Our Old Friend
There's Our Old Friend
 
Posts: 3785
Joined: Fri Jan 05, 2007 08:25:07
Location: the netherlands

Postby VoxOrion » Fri Feb 12, 2010 12:58:19

TheBrig wrote:
VoxOrion wrote:So, to answer your question, I'm interested in understanding how things are put together to the degree that I'm capable. If the essential logic, weights, and variable parts are explained to me well enough I am interested in seeing how things are put together. When equations like:

SIERA = 6.262 – 18.055*(SO/PA) + 11.292*(BB/PA) – 1.721*((GB-FB-PU)/PA) +10.169*((SO/PA)^2) – 7.069*(((GB-FB-PU)/PA)^2) + 9.561*(SO/PA)*((GB-FB-PU)/PA) – 4.027*(BB/PA)*((GB-FB-PU)/PA)


are presented - I'm out. Explain what that means, and I'm back in.

To Matt's credit, he does a good job of attempting to explain what he's up to. Many other posters here will do the same.


Not to sound patronizing here, Vox, but the equation above is really just an ordinary least squares estimate of a linear relationship, which is something I expect most of us first learned how to calculate back in middle school. Granted, it's using multiple regressors and a few second order terms, but still it's something anybody with a few introductory undergraduate statistics classes could readily understand and re-produce on their own.

Tango's Markov Chain simulation approach, on the other hand, for better or worse sounds like a much more complex approach that would take even someone with a PhD in Statistics a good long time to verify and validate.


You do sound patronizing, but that's okay. I am not interested in parsing through anything like that in my free time, whether I'm capable or not.
“There are no cool kids. Just people who have good self-esteem and people who blame those people for their own bad self-esteem. “

VoxOrion
Site Admin
 
Posts: 12963
Joined: Thu Dec 28, 2006 09:15:33
Location: HANLEY POTTER N TEH MAGICALASS LION

Postby TheBrig » Fri Feb 12, 2010 13:05:36

VoxOrion wrote:
TheBrig wrote:
VoxOrion wrote:So, to answer your question, I'm interested in understanding how things are put together to the degree that I'm capable. If the essential logic, weights, and variable parts are explained to me well enough I am interested in seeing how things are put together. When equations like:

SIERA = 6.262 – 18.055*(SO/PA) + 11.292*(BB/PA) – 1.721*((GB-FB-PU)/PA) +10.169*((SO/PA)^2) – 7.069*(((GB-FB-PU)/PA)^2) + 9.561*(SO/PA)*((GB-FB-PU)/PA) – 4.027*(BB/PA)*((GB-FB-PU)/PA)


are presented - I'm out. Explain what that means, and I'm back in.

To Matt's credit, he does a good job of attempting to explain what he's up to. Many other posters here will do the same.


Not to sound patronizing here, Vox, but the equation above is really just an ordinary least squares estimate of a linear relationship, which is something I expect most of us first learned how to calculate back in middle school. Granted, it's using multiple regressors and a few second order terms, but still it's something anybody with a few introductory undergraduate statistics classes could readily understand and re-produce on their own.

Tango's Markov Chain simulation approach, on the other hand, for better or worse sounds like a much more complex approach that would take even someone with a PhD in Statistics a good long time to verify and validate.


You do sound patronizing, but that's okay. I am not interested in parsing through anything like that in my free time, whether I'm capable or not.


Ok, what I should have said is "Very sorry, this is going to sound patronizing." My bad.
5 rounds rapid!

TheBrig
There's Our Old Friend
There's Our Old Friend
 
Posts: 130
Joined: Sun Sep 27, 2009 19:33:36
Location: HQ

Postby jeff2sf » Fri Feb 12, 2010 13:12:55

And where the hell did you go to middle school?
jeff2sf
There's Our Old Friend
There's Our Old Friend
 
Posts: 3395
Joined: Sat Dec 30, 2006 10:40:29

Postby Woody » Fri Feb 12, 2010 13:17:09

At Wissahickon middle school, they taught us that Kriss Kross was indeed wiggidy wiggidy wiggidy wack. That's three wiggidys, for those keeping track.
you sure do seem to have a lot of time on your hands to be on this forum? Do you have a job? Are you a shut-in?

Woody
BSG MVP
BSG MVP
 
Posts: 52472
Joined: Thu Dec 28, 2006 17:56:45
Location: captain of the varsity slut team

Postby TenuredVulture » Fri Feb 12, 2010 13:39:35

VoxOrion wrote:
TheBrig wrote:
VoxOrion wrote:So, to answer your question, I'm interested in understanding how things are put together to the degree that I'm capable. If the essential logic, weights, and variable parts are explained to me well enough I am interested in seeing how things are put together. When equations like:

SIERA = 6.262 – 18.055*(SO/PA) + 11.292*(BB/PA) – 1.721*((GB-FB-PU)/PA) +10.169*((SO/PA)^2) – 7.069*(((GB-FB-PU)/PA)^2) + 9.561*(SO/PA)*((GB-FB-PU)/PA) – 4.027*(BB/PA)*((GB-FB-PU)/PA)


are presented - I'm out. Explain what that means, and I'm back in.

To Matt's credit, he does a good job of attempting to explain what he's up to. Many other posters here will do the same.


Not to sound patronizing here, Vox, but the equation above is really just an ordinary least squares estimate of a linear relationship, which is something I expect most of us first learned how to calculate back in middle school. Granted, it's using multiple regressors and a few second order terms, but still it's something anybody with a few introductory undergraduate statistics classes could readily understand and re-produce on their own.

Tango's Markov Chain simulation approach, on the other hand, for better or worse sounds like a much more complex approach that would take even someone with a PhD in Statistics a good long time to verify and validate.


You do sound patronizing, but that's okay. I am not interested in parsing through anything like that in my free time, whether I'm capable or not.


If it's any consolation Vox, I'm pretty sure Mr. Maltzman did not teach me OLS regression.
Be Bold!

TenuredVulture
You've Got to Be Kidding Me!
You've Got to Be Kidding Me!
 
Posts: 53243
Joined: Thu Jan 04, 2007 00:16:10
Location: Magnolia, AR

Postby phatj » Fri Feb 12, 2010 13:51:08

still it's something anybody with a few introductory undergraduate statistics classes could readily understand and re-produce on their own.

Just how many people do you suppose have taken a few introductory undergraduate statistics classes? Coming from an engineering background, I've taken several college calculus courses and differential equations, but no statistics. But most college students aren't BS candidates and what little math they take isn't stats.
they were a chick hanging out with her friends at a bar, the Phillies would be the 320 lb chick with a nose wart and a dick - Trent Steele

phatj
Moderator
 
Posts: 20683
Joined: Thu Dec 28, 2006 23:07:06
Location: Andaman Limp Dick of Certain Doom

Postby TenuredVulture » Fri Feb 12, 2010 14:06:51

phatj wrote:
still it's something anybody with a few introductory undergraduate statistics classes could readily understand and re-produce on their own.

Just how many people do you suppose have taken a few introductory undergraduate statistics classes? Coming from an engineering background, I've taken several college calculus courses and differential equations, but no statistics. But most college students aren't BS candidates and what little math they take isn't stats.


I would guess math majors might get a few intro stats courses, and probably economics majors take a couple as under grads.

By the way, I think in the typical "baby stats" class, it's a mistake to try to teach regression beyond the terms used in a basic correlation. Not to go all logit and probit, but the world already has enough inappropriate uses of regression.
Be Bold!

TenuredVulture
You've Got to Be Kidding Me!
You've Got to Be Kidding Me!
 
Posts: 53243
Joined: Thu Jan 04, 2007 00:16:10
Location: Magnolia, AR

Postby Bakestar » Fri Feb 12, 2010 15:20:42

I respect the hell out of hardcore sabermetricians and think the work is incredibly valuable, but I never got past Trigonometry in high school and I have to admit that, solely through my own ignorance, Matt's SIERA formula looks like R2-D2 puked on the screen.
Foreskin stupid

Bakestar
BSG MVP
BSG MVP
 
Posts: 14709
Joined: Thu Dec 28, 2006 17:57:53
Location: Crane Jackson's Fountain Street Theatre

Postby TheBrig » Fri Feb 12, 2010 15:23:16

phatj wrote:
still it's something anybody with a few introductory undergraduate statistics classes could readily understand and re-produce on their own.

Just how many people do you suppose have taken a few introductory undergraduate statistics classes? Coming from an engineering background, I've taken several college calculus courses and differential equations, but no statistics. But most college students aren't BS candidates and what little math they take isn't stats.


I wasn't trying to imply it was something everyone should know, only pointing out that the model is tractable enough that pretty much anybody with some college statistics would recognize what it's all about.

The comment about it being taught in middle school and high school was only to point out that it wasn't so different from what little math background we all in common. We've all seen the equation of a line is a + b*x. This is just an estimate of a linear relationship with multiple independent variables-- y = a + b1 * x1 + b2 * x2 + b3 * x3 and so on, where x1 is strikeout rate, x2 is groundball rate, etc, and the dependent variable ("y") being estimated is the actual ERA. The idea being that if you can create an estimate for ERA that depends on only the factors within a pitcher's control (the x's), then you've effectively created a luck-neutral estimate for ERA (Matt can feel free to correct me if I've misstated any of this.)
5 rounds rapid!

TheBrig
There's Our Old Friend
There's Our Old Friend
 
Posts: 130
Joined: Sun Sep 27, 2009 19:33:36
Location: HQ

Postby The Nightman Cometh » Fri Feb 12, 2010 15:24:24

Hell I took calculus last year and I'm not even sure if I am looking at that the right way.
The Nightman Cometh
Dropped Anchor
Dropped Anchor
 
Posts: 8553
Joined: Sun Dec 27, 2009 14:35:45

Postby TheBrig » Fri Feb 12, 2010 15:30:27

jeff2sf wrote:And where the hell did you go to middle school?


I went to a Catholic middle school in Delaware. I remember having to calculate OLS fitted lines back in 5th or 6th grade. Then we had it again in 9th grade when I got to high school.
5 rounds rapid!

TheBrig
There's Our Old Friend
There's Our Old Friend
 
Posts: 130
Joined: Sun Sep 27, 2009 19:33:36
Location: HQ

Postby jamiethekiller » Fri Feb 12, 2010 15:30:56

TheBrig wrote:
jeff2sf wrote:And where the hell did you go to middle school?


I went to a Catholic middle school in Delaware. I remember having to calculate OLS fitted lines back in 5th or 6th grade. Then we had it again in 9th grade when I got to high school.


what school? maybe we rode the bus together

jamiethekiller
Plays the Game the Right Way
Plays the Game the Right Way
 
Posts: 26938
Joined: Sun Dec 31, 2006 03:31:02

Postby TheBrig » Fri Feb 12, 2010 15:31:32

jamiethekiller wrote:
TheBrig wrote:
jeff2sf wrote:And where the hell did you go to middle school?


I went to a Catholic middle school in Delaware. I remember having to calculate OLS fitted lines back in 5th or 6th grade. Then we had it again in 9th grade when I got to high school.


what school? maybe we rode the bus together


St. Edmond's, you?
5 rounds rapid!

TheBrig
There's Our Old Friend
There's Our Old Friend
 
Posts: 130
Joined: Sun Sep 27, 2009 19:33:36
Location: HQ

Postby jamiethekiller » Fri Feb 12, 2010 15:33:10

TheBrig wrote:
jamiethekiller wrote:
TheBrig wrote:
jeff2sf wrote:And where the hell did you go to middle school?


I went to a Catholic middle school in Delaware. I remember having to calculate OLS fitted lines back in 5th or 6th grade. Then we had it again in 9th grade when I got to high school.


what school? maybe we rode the bus together


St. Edmond's, you?


ahh, st. peters. what high school?


we never did any stats type stuff till i got to highschool. and that wasn't till i was a junior

jamiethekiller
Plays the Game the Right Way
Plays the Game the Right Way
 
Posts: 26938
Joined: Sun Dec 31, 2006 03:31:02

Postby smitty » Fri Feb 12, 2010 15:46:26

There were no middle schools when I was a kid. Just Junior High. I was never very good or very interested in math and I'm still not. I had to take Math 5 twice just to pass it.

That said, you don't have to be a math stud to enjoy the new stats or whatever you want to call it. If the math stuff is explained well it's possible to follow along well enough.

I have always liked Bill James. His basic tenets like baseball is way to complicated to be understood completely -- or even to get incredibly close to it. Or starting with questions and then looking of the best answer you can find instead of starting with the "answer" and then trying to "prove" it with some formula or the other are baseball gold in my view. Further, I'm also a big believer in the idea that you can never get too much information regarding analysis of a player or a team. All information is good. As long as you can understand what it means. And no one is smart enough to completely understand it all.

I think there are a lot of ways to enjoy baseball and there really is no wrong way. If someone wants to just watch the games and agree with Joe Morgan that stats aren't really important that's fine by me. Baseball is great enough of a game that it certainly can be enjoyed that way. That view is fine and so is the math/econ/stat/Markov regression view. It's all a lot of fun.
Teams lie, sometimes for good reasons, sometimes for bad. They do it to get an advantage while they look at the trade market or just because they can

--Will Carroll

smitty
BSG MVP
BSG MVP
 
Posts: 45450
Joined: Sat Dec 30, 2006 03:00:27
Location: Federal Way, WA --Spursville

Postby jeff2sf » Fri Feb 12, 2010 15:49:58

smitty wrote:There were no middle schools when I was a kid. Just Junior High. I was never very good or very interested in math and I'm still not. I had to take Math 5 twice just to pass it.

That said, you don't have to be a math stud to enjoy the new stats or whatever you want to call it. If the math stuff is explained well it's possible to follow along well enough.

I have always liked Bill James. His basic tenets like baseball is way to complicated to be understood completely -- or even to get incredibly close to it. Or starting with questions and then looking of the best answer you can find instead of starting with the "answer" and then trying to "prove" it with some formula or the other are baseball gold in my view. Further, I'm also a big believer in the idea that you can never get too much information regarding analysis of a player or a team. All information is good. As long as you can understand what it means. And no one is smart enough to completely understand it all.

I think there are a lot of ways to enjoy baseball and there really is no wrong way. If someone wants to just watch the games and agree with Joe Morgan that stats aren't really important that's fine by me. Baseball is great enough of a game that it certainly can be enjoyed that way. That view is fine and so is the math/econ/stat/Markov regression view. It's all a lot of fun.


I think "all information is good" is rarely a true statement.
jeff2sf
There's Our Old Friend
There's Our Old Friend
 
Posts: 3395
Joined: Sat Dec 30, 2006 10:40:29

Postby TenuredVulture » Fri Feb 12, 2010 15:57:14

Smitty is right though in saying there is no wrong way to enjoy baseball.
Be Bold!

TenuredVulture
You've Got to Be Kidding Me!
You've Got to Be Kidding Me!
 
Posts: 53243
Joined: Thu Jan 04, 2007 00:16:10
Location: Magnolia, AR

PreviousNext