E Solutions ch. 6 - Support vector machines

Solutions to exercises of chapter 6.

E.1 Exercise 1

Load required libraries

## Loading required package: lattice
## Loading required package: ggplot2
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
## Type 'citation("pROC")' for a citation.
## 
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

Define a radial SVM using the e1071 library

Setup parallel processing

## [1] 8

Load data

Extract predictors from segmentationData

Partition data

Set seeds for reproducibility (optional). We will be trying 9 values of the tuning parameter with 5 repeats of 10 fold cross-validation, so we need the following list of seeds.

We will pass the twoClassSummary function into model training through trainControl. Additionally we would like the model to predict class probabilities so that we can calculate the ROC curve, so we use the classProbs option.

Tune SVM over the cost parameter. The default grid of cost parameters start at 0.25 and double at each iteration. Choosing tuneLength = 9 will give us cost parameters of 0.25, 0.5, 1, 2, 4, 8, 16, 32 and 64. The train function will calculate an appropriate value of sigma (the kernel parameter) from the data.

## Support Vector Machines with Radial Kernel - e1071 
## 
## 1010 samples
##   56 predictor
##    2 classes: 'PS', 'WS' 
## 
## Pre-processing: centered (56), scaled (56) 
## Resampling: Cross-Validated (10 fold, repeated 5 times) 
## Summary of sample sizes: 909, 909, 909, 909, 909, 909, ... 
## Resampling results across tuning parameters:
## 
##   cost   ROC        Sens       Spec     
##    0.25  0.8769145  0.8772308  0.6605556
##    0.50  0.8814957  0.8827692  0.6961111
##    1.00  0.8818205  0.8735385  0.7266667
##    2.00  0.8796581  0.8655385  0.7350000
##    4.00  0.8706410  0.8569231  0.7272222
##    8.00  0.8564786  0.8483077  0.7055556
##   16.00  0.8457009  0.8455385  0.7061111
##   32.00  0.8333077  0.8323077  0.6961111
##   64.00  0.8246923  0.8243077  0.6844444
## 
## ROC was used to select the optimal model using the largest value.
## The final value used for the model was cost = 1.
## 
## Call:
## svm.default(x = as.matrix(x), y = y, kernel = "radial", cost = param$cost, 
##     probability = classProbs)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  1 
## 
## Number of Support Vectors:  540

SVM accuracy profile

SVM accuracy profile.

SVM accuracy profile.

Test set results

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  PS  WS
##         PS 569 101
##         WS  81 258
##                                           
##                Accuracy : 0.8196          
##                  95% CI : (0.7945, 0.8429)
##     No Information Rate : 0.6442          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.6015          
##                                           
##  Mcnemar's Test P-Value : 0.159           
##                                           
##             Sensitivity : 0.8754          
##             Specificity : 0.7187          
##          Pos Pred Value : 0.8493          
##          Neg Pred Value : 0.7611          
##              Prevalence : 0.6442          
##          Detection Rate : 0.5639          
##    Detection Prevalence : 0.6640          
##       Balanced Accuracy : 0.7970          
##                                           
##        'Positive' Class : PS              
## 

Get predicted class probabilities

##           PS         WS
## 1  0.8919246 0.10807539
## 5  0.9110754 0.08892455
## 8  0.9678410 0.03215904
## 9  0.5512136 0.44878636
## 10 0.8198217 0.18017825
## 12 0.7684369 0.23156308

Build a ROC curve

## Setting levels: control = PS, case = WS
## Setting direction: controls > cases
## Area under the curve: 0.8908

Plot ROC curve.

SVM ROC curve for cell segmentation data set.

SVM ROC curve for cell segmentation data set.

Calculate area under ROC curve

## Area under the curve: 0.8908