Chapter 8 Prediction Errors

Prediction errors mean different things for classification problems and prediction problems. We consider error measurement for classification problems first.

8.1 Classification errors

Measured using confusionMatrix in the caret package. Also can use the ce (classfication error) function in the Metrics package.

                    Truth
                   
                         Yes                   No
                 _______________________________________ 
                 |       TP                    FP
 Prediction  Yes |  True Positive         False Positive
                 |                        (alpha - Type I   
                 |                            error)  
                 |
                 |       FN                    TN 
             No  |  False Negative        True Negative
                 |  (beta - Type II
                 |      error)
                 |
                
  • Accuracy = (TP + TN)/ (TP + FP + FN + TN) = P(Correct Outcome)
  • Sensitivity = TP/(TP + FN) = P(Predict Yes | Truth Yes)
  • Specificity = TN/(FP + TN) = P(Predict No | Truth No)
  • Positive Predictive Value = TP/(TP + FP) = P(Truth Yes | Predict Yes)
  • Negative Predictive Value = TN/(TN + FN) = P(Truth No | Predict No)

Of the above, Accuracy is the most easy to understand and oft used measure.


8.2 RMSE for continuous variables

For continuous variables a confusion matrix would not make sense. So we use RMSE - Root Mean Square Error. This is generally provided when looking at the summary of the fitted model, but can be had using the rmse function from the Metrics library.

The Metrics library also provides the mae function, which is the Mean Absolute Error. This library has many interesting error measurement functions that you can investigate if interested.

RMSE = sqrt(mean(actual - predicted)^2)

That is basically it!

As we fit models, we will be looking for either accuracy or RMSE of our fitted models.