Chapter 13 Linear Discriminant Analysis and Naive Bayes

13.1 Naive Bayes

What is Naive Bayes?

Essentially, the logic behind Naive Bayes is as follows: Instead of taking the absolute probability of something happening, we look at the probability of something happening given other things we know have already happened. So the probability of a flood in the next 1 week may be say 0.1%, but this probability would be different if we already know that 6 inches of rain has already fallen the past 24 hours. For each of the categories to be predicted, Naive Bayes considers the conditional probability given the other independent variables. A great explanation of Naive Bayes appears on the following webpage:

https://stackoverflow.com/questions/10059594/a-simple-explanation-of-naive-bayes-classification

Normally one would think that though the predictor would be a category, the predictors should be categories too. However Naive Bayes takes continuous variables as predictors too. What it does for continuous variables is assume a normal distribution, and then using the mean and sd calculate probabilities that it then uses in the algorithm. But you could get warnings when you run NB on continuous predictors.

library(caret)
data(diamonds)

inTrain <- createDataPartition(y=diamonds$cut, p=0.75, list=FALSE)
training <- diamonds[inTrain,]
testing <- diamonds[-inTrain,]

Let us build the model

fitnb <- train(cut ~ .,method="nb",data=training)

13.2 Linear Discriminant Analysis

Algorithm developed by Ronald A. Fisher in 1936. Essentially allows drawing lines that separate data into different classes. The predicted variable is a class, or is categorical, but LDA expects predictor variables to be continuous due to its distributional assumption of independent variables being multivariate normal. Use something else if the features are a mix of continuous and categorical variables.

We try an example with the mtcars dataset where we try to predict the number of cylinders in a car using mpg, diaplacement, horsepower, rear axle ratio, weight and quarter-mile time.

library(caret)
data(mtcars)
mtcars$cyl = factor(mtcars$cyl) #convert cyl into a factor variable

inTrain <- createDataPartition(y=mtcars$cyl, p=0.75, list=FALSE)
training <- mtcars[inTrain,]
testing <- mtcars[-inTrain,]

We do the fit, look at the predicted accuracy, and then the confusionMatrix.

fitlda = train(cyl~ mpg + disp + hp + drat + wt + qsec, data = training, method = "lda")
fitlda

## Linear Discriminant Analysis 
## 
## 26 samples
##  6 predictor
##  3 classes: '4', '6', '8' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 26, 26, 26, 26, 26, 26, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.7641732  0.6372931

predict(fitlda, mtcars)

##  [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
## Levels: 4 6 8

confusionMatrix(testing$cyl, predict(fitlda, newdata = testing))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction 4 6 8
##          4 2 0 0
##          6 0 1 0
##          8 0 0 3
## 
## Overall Statistics
##                                      
##                Accuracy : 1          
##                  95% CI : (0.5407, 1)
##     No Information Rate : 0.5        
##     P-Value [Acc > NIR] : 0.01563    
##                                      
##                   Kappa : 1          
##  Mcnemar's Test P-Value : NA         
## 
## Statistics by Class:
## 
##                      Class: 4 Class: 6 Class: 8
## Sensitivity            1.0000   1.0000      1.0
## Specificity            1.0000   1.0000      1.0
## Pos Pred Value         1.0000   1.0000      1.0
## Neg Pred Value         1.0000   1.0000      1.0
## Prevalence             0.3333   0.1667      0.5
## Detection Rate         0.3333   0.1667      0.5
## Detection Prevalence   0.3333   0.1667      0.5
## Balanced Accuracy      1.0000   1.0000      1.0