Tidying Up Data with Tidyr

S26

Start from mtcars and

  1. Create a new column for the brand

  2. Create a new column for the make

  3. Create a field “code” that holds the code hp/wt/mpg

## ── Attaching core tidyverse packages ────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ──────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Warning: Expected 3 pieces. Missing pieces filled with `NA` in 27 rows [1, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
## 18, 19, 20, 21, 22, ...].

dplyr

S40

Display the name and brand as well as mpg for all cars that have a mpg between 15 and 20

Create a new column “ltr” that contains the fuel consumption in liters per hundred kilometers

Add a column “nice” that contains “:-)” if the mpg is below 15 and the hp is above 100, and “:-(” otherwise.

String Manipulation

Start again from mtcars and extract brand and make via regex

Try the same by using the library rex

## 
## Attaching package: 'rex'
## The following object is masked from 'package:stringr':
## 
##     regex
## The following object is masked from 'package:dplyr':
## 
##     matches
## The following object is masked from 'package:tidyr':
## 
##     matches

lubridate

Consider a moment in the past (e.g. your start-date at HSBC, your birth-date, N. Armstrong steps on the moon, etc)

  1. What time interval defines the difference between now and that moment?
  2. What is the duration (years and seconds)
  3. how many years is that exactly?
  4. How many days?
  5. How many seconds?

Forcats

Use the dataset mtcars and

  1. create factors for the gearbox (column am) ,
  2. make the factors factors A for automatic and M for manual,
  3. show the count,
  4. show the histogram, and
  5. show the median and average consumption for each factor, as well as the count.

Binning

Consider mtcars and define a variable is_economical that is based on mpg with a 1 where mpg is above \(20 mpg\) and a 0 elsewhere.

Assume that we prepare a logistic regression to explain is_economical. Find a good binning for the number of carburetors (the variable carb).