Thursday, May 25, 2006
Italian Ballplayers
Today, we used spaghetti to illustrate fitting lines to (x, y) plots. Since spaghetti is Italian, we fit a line to (HR, SLG) data for Mike Piazza (great player of Italian descent) for the years that he played with the Mets.
- We want a fitted line to be close to the points in some sense.
- We define a residual to be the difference between the observed y value and the y value one would predict from the line.
- We measure the goodness of a fit by the sum of squared residuals.
- The best line, the least-squares line, is the one that makes the sum of squared residuals as small as possible.
- Fathom has a neat way of showing graphically the squared residuals. By playing with a movable line, we tried to make the sum of squared residuals small.
- In this trial-and-error approach, we get a pretty good line, but the least-squares line is the best one from the viewpoint of smallest sum of squared residuals.
- We can compare different predictors by looking at the "sum of squares". Generally, we see that OPS is the best predictor, followed by SLG and OBP, and then by AVG.