Ch 21: Regression

Linear Regression

Build a good model to predict the mpg of cars based on the data-set mtcars

Ch22: Classification Models

Logistic Regression

Using the data-set of the Titanic (use titanic_train from the library titanic, or download the data from, build a logistic regression to predict survival.


  1. using the knowledge of previous part (data binning in particular), build a better model.
  2. plot the ROC and the KS
  3. calculate Gini, KS
  4. find the optimal cutoff (different ways of thinking possible)

Decision Tree, Random Forest, aNN, SVM

Build a Decision Tree, Random Forest, aNN, and SVN model for:

  1. a regression tree for predicting mpg in mtcars
  2. a classification tree to predict Survived in titanic_train (from the package titanic)

Ch 23: Learning Machines


Study the database iris and try to find clusters that represent Species.

Ch 24: Towards a Tidy Modelling Cycle with modelr

Ch 25: Model Validation

Choose one of the previously created models and perform a Cross Validation – what are your conclusions?

Ch 26: Labs

Ch 27: MCDA

Company Purchase

You are part of the board and the strategy director presents the expansion strategy. There are 7 possible target companies (column “company”)

The following criteria are represented:

  • price = the fair value of the company according to the strategy team. This is not the asking price, but we are confident that we can negotiate to this level.
  • sales_potential: an assessment of how much this company can contribute to the sales of the common product
  • engineering: R&D capacity (on top of our own)
  • team: the quality of the management team of the target company as assessed by the strategy team
  • loan: “no” if we can buy the target without loan and if we need a loan, then we use the amount of that loan
  • score: the desirability of the target as assessed by the strategy team
  • logo: whether the logo is compatible with ours

The director of the strategy team is here and you can ask questions. The voting is in 40 minutes. Work with your team to decide which targets are acceptable (vote abstain), desirable (vote in favour) or bad choices (vote against)

Your opinion is that we need sales capacity, and maybe some engineering capacity, much less important is the new team. Having a loan is acceptable for you, but you rather value your opinion than the opinion of the strategy team and hence will build your own model.

Build a model in Excel, Libreoffice Calc, or R to prepare your opinion. Make sure you apply at least the WSM.

The data is in the file company_purchase.csv and copied here: