Chapter 15 Multinomial Logistic Regression

Multinomial Logistic Regression extends the capabilities of binomial logistic regression where we have multiple categories to categorize into. For a nominal dependent variable with k categories the multinomial regression model estimates k-1 logit equations. In that sense, it is an extension of logistic regression.

The independent variables can be both continuous or factors, but the predicted variable is always a class.

We try to predict the species in the iris dataset.

data("iris")
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Next we create the partitions.

library(caret)
data(iris)
inTrain <- createDataPartition(y=iris$Species, p=0.75, list=FALSE)
training <- iris[inTrain,]
testing <- iris[-inTrain,]

We now fit the model:

fit = train(Species ~ ., data = training, method = "multinom")

We check the quality of the fit.

fit
## Penalized Multinomial Regression 
## 
## 114 samples
##   4 predictor
##   3 classes: 'setosa', 'versicolor', 'virginica' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 114, 114, 114, 114, 114, 114, ... 
## Resampling results across tuning parameters:
## 
##   decay  Accuracy   Kappa    
##   0e+00  0.9541438  0.9305495
##   1e-04  0.9609881  0.9408379
##   1e-01  0.9664773  0.9489501
## 
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was decay = 0.1.
confusionMatrix(testing$Species, predict(fit, newdata = testing))
## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   setosa versicolor virginica
##   setosa         12          0         0
##   versicolor      0         11         1
##   virginica       0          0        12
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9722          
##                  95% CI : (0.8547, 0.9993)
##     No Information Rate : 0.3611          
##     P-Value [Acc > NIR] : 7.69e-15        
##                                           
##                   Kappa : 0.9583          
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: setosa Class: versicolor Class: virginica
## Sensitivity                 1.0000            1.0000           0.9231
## Specificity                 1.0000            0.9600           1.0000
## Pos Pred Value              1.0000            0.9167           1.0000
## Neg Pred Value              1.0000            1.0000           0.9583
## Prevalence                  0.3333            0.3056           0.3611
## Detection Rate              0.3333            0.3056           0.3333
## Detection Prevalence        0.3333            0.3333           0.3333
## Balanced Accuracy           1.0000            0.9800           0.9615