Chapter 15 Multinomial Logistic Regression
Multinomial Logistic Regression extends the capabilities of binomial logistic regression where we have multiple categories to categorize into. For a nominal dependent variable with k categories the multinomial regression model estimates k-1 logit equations. In that sense, it is an extension of logistic regression.
The independent variables can be both continuous or factors, but the predicted variable is always a class.
We try to predict the species in the iris dataset.
data("iris")
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Next we create the partitions.
library(caret)
data(iris)
inTrain <- createDataPartition(y=iris$Species, p=0.75, list=FALSE)
training <- iris[inTrain,]
testing <- iris[-inTrain,]
We now fit the model:
fit = train(Species ~ ., data = training, method = "multinom")
We check the quality of the fit.
fit
## Penalized Multinomial Regression
##
## 114 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 114, 114, 114, 114, 114, 114, ...
## Resampling results across tuning parameters:
##
## decay Accuracy Kappa
## 0e+00 0.9541438 0.9305495
## 1e-04 0.9609881 0.9408379
## 1e-01 0.9664773 0.9489501
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was decay = 0.1.
confusionMatrix(testing$Species, predict(fit, newdata = testing))
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 12 0 0
## versicolor 0 11 1
## virginica 0 0 12
##
## Overall Statistics
##
## Accuracy : 0.9722
## 95% CI : (0.8547, 0.9993)
## No Information Rate : 0.3611
## P-Value [Acc > NIR] : 7.69e-15
##
## Kappa : 0.9583
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.0000 1.0000 0.9231
## Specificity 1.0000 0.9600 1.0000
## Pos Pred Value 1.0000 0.9167 1.0000
## Neg Pred Value 1.0000 1.0000 0.9583
## Prevalence 0.3333 0.3056 0.3611
## Detection Rate 0.3333 0.3056 0.3333
## Detection Prevalence 0.3333 0.3333 0.3333
## Balanced Accuracy 1.0000 0.9800 0.9615