Tuesday, June 20, 2006
Situational Data and Streakiness
The last couple of days we're discussed several types of data where people get confused about ability relative to performance.
First let's discuss situational stats:
First let's discuss situational stats:
- Suppose you have hitting averages (say AVGs or OBPs) for many players. How do we know if the differences in averages are due to real difference in batting abilities or to chance variation? We look at the averages for two consecutive seasons. If we see a positive relationship in the scatterplot, then the players do indeed have different abilities.
- We looked at home versus away on-base percentages for all players in the 2005 season. We saw a lot of variation in the differences (home OBP - away OBP), but, on average, players tend to have a higher on-base percentage at home.
- But there is little significance to the players who bat unusually well (or poor) at home. Chipper Jones and Derek Jeter batted much better at home in 2005, but this advantage doesn't persist across seasons.
- Most situational effects are biases -- the probability that a player gets on-base at home will be higher than away, but this change in the probability will be the same across all players.
- The public is fascinated with long streaks and slumps, the most famous being Joe DiMaggio's 56-game hitting streak in 1941.
- How do we measure streakiness? We discussed several ways such as the longest run of games with a hit, or moving averages, where we compute batting averages for short windows of, say 5 games.
- It is hard to describe streakiness ability, but it is easy to describe consistent ability. This is where the chance of a success (like a win or a hit) is constant across the season and the outcomes of different games (or plate appearances) are independent.
- If you simulate data assuming a player or team has consistent ability, you'll see interesting patterns of hot streak and slumps.
- What this means is that it difficult to say that a player or team is truly streaky. The streaky pattern of the performance resembles what you see from coin tossing which is really not streaky.