Chapter 16 Predicting with linear regression and loess

In this chapter we will look at predicting with linear regression using the caret package. We don’t have to use caret, we could just use lm which is part of standard R functionality. We will get the same results but we prefer to use caret because it helps with consistency, and also the caret algorithm does bagging which can improve performance.

library(caret)
library(MASS)
# library(Metrics)
data(Boston)

inTrain = createDataPartition(y=Boston$medv, p=0.75, list=FALSE)
training = Boston[inTrain,]
testing = Boston[-inTrain,]

We create the model using the train command.

fit = train(medv ~ ., method="glm", data=training)

Next we look at how accurate the model is.

fit
## Generalized Linear Model 
## 
## 381 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 381, 381, 381, 381, 381, 381, ... 
## Resampling results:
## 
##   RMSE      Rsquared  MAE     
##   5.096809  0.693311  3.575649
pred = predict(fit, newdata = testing) # Store predictions in a vector called pred
# rmse(testing$medv, pred) # Calculate the RMSE
sqrt(mean((testing$medv - pred)^2))
## [1] 4.824998
# mae(testing$medv, pred) # Calculate the Mean Absolute Error
mean(abs(testing$medv - pred))
## [1] 3.307904

16.1 LOESS

Loess is locally weighted scatterplot smoothing. Instead of taking the entire set of data, you look for the k points nearest to your x value, and create a little regression estimate for just those points. Then you use this regression to get the fit. This gives a better fit often, but it doesn’t work in situations where your x is outside of the range of _x_s in the training set.

We try using LOESS for the same example as before, and check if we can improve our error estimates.

library(caret)
library(MASS)

data(Boston)

inTrain = createDataPartition(y=Boston$medv, p=0.75, list=FALSE)
training = Boston[inTrain,]
testing = Boston[-inTrain,]

We create the model using the train command.

fit2 = train(medv ~ ., method="gamLoess", data=training)

Next we look at how accurate the model is.

fit2
## Generalized Additive Model using LOESS 
## 
## 381 samples
##  13 predictor
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 381, 381, 381, 381, 381, 381, ... 
## Resampling results:
## 
##   RMSE      Rsquared  MAE   
##   4.232458  0.81994   2.6286
## 
## Tuning parameter 'span' was held constant at a value of 0.5
## 
## Tuning parameter 'degree' was held constant at a value of 1
pred = predict(fit2, newdata = testing) # Store predictions in a vector called pred
# rmse(testing$medv, pred) # Calculate the RMSE
sqrt(mean((testing$medv - pred)^2))
## [1] 5.184681
# mae(testing$medv, pred) # Calculate the Mean Absolute Error
mean(abs(testing$medv - pred))
## [1] 2.808387