Tuesday, June 20, 2006
Situational Data and Streakiness
The last couple of days we're discussed several types of data where people get confused about ability relative to performance.
First let's discuss situational stats:
First let's discuss situational stats:
- Suppose you have hitting averages (say AVGs or OBPs) for many players. How do we know if the differences in averages are due to real difference in batting abilities or to chance variation? We look at the averages for two consecutive seasons. If we see a positive relationship in the scatterplot, then the players do indeed have different abilities.
- We looked at home versus away on-base percentages for all players in the 2005 season. We saw a lot of variation in the differences (home OBP - away OBP), but, on average, players tend to have a higher on-base percentage at home.
- But there is little significance to the players who bat unusually well (or poor) at home. Chipper Jones and Derek Jeter batted much better at home in 2005, but this advantage doesn't persist across seasons.
- Most situational effects are biases -- the probability that a player gets on-base at home will be higher than away, but this change in the probability will be the same across all players.
- The public is fascinated with long streaks and slumps, the most famous being Joe DiMaggio's 56-game hitting streak in 1941.
- How do we measure streakiness? We discussed several ways such as the longest run of games with a hit, or moving averages, where we compute batting averages for short windows of, say 5 games.
- It is hard to describe streakiness ability, but it is easy to describe consistent ability. This is where the chance of a success (like a win or a hit) is constant across the season and the outcomes of different games (or plate appearances) are independent.
- If you simulate data assuming a player or team has consistent ability, you'll see interesting patterns of hot streak and slumps.
- What this means is that it difficult to say that a player or team is truly streaky. The streaky pattern of the performance resembles what you see from coin tossing which is really not streaky.
Thursday, June 15, 2006
The Post-Game Show
Today we talked about the Mud Hens - Sky Chiefs game we watched last night. Here are some things we discussed.
- The Mud Hens lost primarily due to their weak hitting. We compared the Mud Hens and Sky Chiefs from two perspectives: their outcomes of plate appearances, such as SO, BB, H, and non-SO Outs, and the distribution of on-base hits, such as BB or HBP, singles, doubles, triples and home runs. It seemed that the Mud Hens had a lot of strikeouts, no walks, and few hits.
- The most significant plays were the clutch hits for the Sky Chiefs in the top of the 2nd inning.
- Was there luck in the game? There were several base hits that barely missed the infielders. There was a lucky bloop single by a hitter who was fooled by the pitch.
- A 95% confidence interval is helpful in learning about a batter's hitting probability on the basis of his batting average.
- The length of the interval depends on the sample size n and the confidence level.
- Confidence intervals are helpful in comparing batter abilities. If the intervals for the abilities of two hitters overlap, then we can't say that one batter is "better" than the other.
Wednesday, June 14, 2006
The Pregame Show
Today we had a short class since we'll be going to the Mud Hens game tonight. I passed out scoresheets and directions to the ballpark. I should be easy to spot since I'll be wearing a Phillies cap.
There is a short assignment that goes along with the ballgame. In this assignment, I want you to
-- score a couple of innings
-- identify the players who have the best AVG and best OBP for this game
-- find the key play (the one that was most influential in the team's victory)
-- identify a couple of plays that appeared lucky or controlled by chance variation
See you at the game!
There is a short assignment that goes along with the ballgame. In this assignment, I want you to
-- score a couple of innings
-- identify the players who have the best AVG and best OBP for this game
-- find the key play (the one that was most influential in the team's victory)
-- identify a couple of plays that appeared lucky or controlled by chance variation
See you at the game!
Tuesday, June 13, 2006
Ability and Performance
Today we started talking about learning from data. Here are the main things we talked about:
- There is a distinction between a player's ability and his performance. An ability is an intrinsic quality of a player, say his batting talent, that we really don't know exactly. We do observe a player's performance, say his batting average for a particular season.
- The objective of Statistics is to learn about a player's ability on the basis of his performance.
- Suppose we know a player's true ability to get on base. We describe this ability by a number p that represents his probability of on-base. If we know the value of p, say p = .4, we can predict how many times he will get on base in 20 PAs. We simulated this process in class by the use of 10-sided dice.
- Above we knew a person's ability p and we predicted his on-base percentage in 20 PAs. The statistical inference problem goes in reverse. Suppose we observe a player get on-base 8 times in 20 PAs -- what does that tell us about his true on-base probability p?
- In Fathom we did a simulation experiment in two steps: first, we simulated a player's true batting average p, and then we simulated the number of times on base given that value of p. By classifying all simulations by p and his observed AVG, we saw what we could learn about p based on his AVG in 20 AB.
Monday, June 12, 2006
Spinner Game
Sorry I was away for a couple of days last week. But I understand Sherwin was helpful in getting you to make your spinners and understanding binomial experiments.
Today we played game between the "All Stars" (your spinners) and "ASB's" (my spinner cards from the All Star Baseball Game). Here are the highlights of our game:
Today we played game between the "All Stars" (your spinners) and "ASB's" (my spinner cards from the All Star Baseball Game). Here are the highlights of our game:
- The ASBs won the game 17-14!
- Ralph Kiner was clearly the star of the game. He was 4 for 5 with two home runs, one triple, and one single. His grand slam in the bottom of the 8th broke open a tight 13-13 game.
- Although the ASBs won the game, their pitcher was pretty wild. The All-Stars had 11 walks during the game.
- Although the game was fun, there was room for improvement in the construction of the spinners. The heavy piece of paper was intended for the part that spins. Well, maybe the next class will be given better instruction.
Tuesday, June 06, 2006
Big League Baseball
Today we talked about a simple baseball dice game called Big League Baseball. It gives us an opportunity to talk about simple probability models and see how these models reflect real baseball variability.
- The single die controls the pitch. In Fathom, we simulated the process of throwing balls and strikes and found the approximate probabilities of a strikeout, a walk, and a ball "in-play".
- We compared these probabilities with the actual proportions of pa's that are strikeouts, walks, and inplay.
- Once a ball is in-play, rolls of two dice determine the play outcome. Probabilities of events such as singles, doubles, etc are easier to compute. We used Fathom to simulate this process and compared the game probabilities with the actual probabilities of play events.
Monday, June 05, 2006
A First Look at Probability
Today we started to look at probability, the way of quantifying outcomes attributed to chance. Here is a summary of our discussion.
- Life and baseball are uncertain.
- We don't know if Barry Bonds will hit a home run in the next at-bat, but we can assign a number (a probability) to "Bonds hits a home run" that represents our belief about the likelihood of a Bonds' dinger.
- One way of thinking about a probability is a long-run relative frequency. Bonds' home run probability is approximately the relative frequency of Bonds home runs during a season.
- A first step in assigning probabilities is to think of a sample space, a list of all possible outcomes. The sample space for a plate appearance is SO, BB, 1B, 2B, 3B, HR,
- It is more difficult to assign probabilities to outcomes. We focused on probabilities assigned to die rolls.