(R-squared). The coefficient of determination. In regression analysis, the square of the correlation between Y
(the forecast variable) and
(the estimated Y value based on the set of explanatory variables) is denoted as R2. R2 can be interpreted as the proportion of variance in Y that can be explained by the explanatory variables. R2 is appropriate only when examining holdout data (use adjusted R2 for the calibration data). Some researchers believe that the dangers of R2 outweigh its advantages. Montgomery and Morrison (1973) provide a rule of thumb for estimating the calculated R2 when the true R2 is zero: it is R2 = v/n, where v is the number of variables and n is the number of observations. They showed how to calculate the inflation in R2 and also presented a table showing sample sizes, number of variables, and different assumptions as to the true R2. If you are intent on increasing R2,
see “Rules for Cheaters” on the Practitioners
page,. R2can be especially misleading for time-series data. Used with caution, R2 may be useful for diagnostic purposes in some cases, most likely when dealing with cross-sectional data. Even then, however, the correlation coefficient is likely to be a better measure.
- Montgomery, D. & D. Morrison (1973), “A note on
adjusting R2,” Journal of Finance, 28,