F Solutions ch. 9 - Decision trees and random forests

Solutions to exercises of chapter 7.

F.1 Exercise 1

Load the necessary packages
readr to read in the data
dplyr to process data
party and rpart for the classification tree algorithms

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: grid
## Loading required package: mvtnorm
## Loading required package: modeltools
## Loading required package: stats4
## Loading required package: strucchange
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich

Select features that may explain survival

Each row in the data is a passenger. Columns are features:

survived: 0 if died, 1 if survived
embarked: Port of Embarkation (Cherbourg, Queenstown,Southampton)
sex: Gender
sibsp: Number of Siblings/Spouses Aboard
parch: Number of Parents/Children Aboard
fare: Fare Payed

Make categorical features should be made into factors

## Parsed with column specification:
## cols(
##   pclass = col_character(),
##   survived = col_double(),
##   name = col_character(),
##   sex = col_character(),
##   age = col_double(),
##   sibsp = col_double(),
##   parch = col_double(),
##   ticket = col_character(),
##   fare = col_double(),
##   cabin = col_character(),
##   embarked = col_character(),
##   boat = col_character(),
##   body = col_double(),
##   home.dest = col_character()
## )

Split data into training and test sets

Recursive partitioning is implemented in “rpart” package

Conditional partitioning is implemented in the “ctree” method

Use ROCR package to visualize ROC Curve and compare methods

Acknowledgement: the code for this excersise is from http://bit.ly/2fqWKvK