Quantititive Methods
WMS
Last Updated: 2024-11-27
Content
- 1. Goal
- 2. Calendar
- 3. Lectures
- 1. Agreements and Introduction program
- 2. The history of innovation
- 3. Getting started with R
- 4. Importing Data in R
- 5. Data Wrangling in R
- 6. Building Models in R
- 7. Introduction to Companies
- 8. Automated Reporting in R
- 9. Bigger Data and Faster Code
- 10. Ethics
- 11. Bias in data
- 12. Ideas for the end-projects
- 4. Exam
Goal
In this program we focus on a selection of the material presented in the boook "The big R-book: from data science to learning machines and big data." We start with introducing the staticistical programming language R and use it to wrangle data, build models, verify models and builds reports.
The homepage of the book is here.
Calendar
# | Date | Time | Where | Content |
---|---|---|---|---|
1 | 2024-10-18 | 9:45–11:15 | C7/2.11 | introduction and agreements |
2 | 2024-10-25 | 9:45–11:15 | C7/2.11 | Introduction: history of innovation and starting with R |
3 | 2024-11-08 | 9:45–11:15 | C7/2.11 | [3] Starting with R |
4 | 2024-11-15 | 9:45–11:15 | C7/2.11 | [4 + 5] Tidyverse, data manipulation and databases and [8] automated reporting with RMarkdown |
4 | 2024-11-22 | 9:45–11:15 | C7/2.11 | [8] automated reporting with RMarkdown and [6] Linear regressions in R |
5 | 2024-11-29 | 9:45–11:15 | C7/2.11 | [6] Logisitc regression in R |
6 | 2024-12-6 | 9:45–11:15 | C7/2.11 | [6] Performance of binary classification models |
7 | 2024-12-13 | 9:45–11:15 | C7/2.11 | [6] Cross validation |
8 | 2024-12-20 | 9:45–11:15 | C7/2.11 | [6] AI: decision tree and random forest |
9 | 2025-01-10 | 9:45–11:15 | C7/2.11 | [6] AI: Neural networks and deep learning |
10 | 2025-01-17 | 9:45–11:15 | C7/2.11 | [6] SVN and k-means |
11 | 2025-01-24 | 9:45–11:15 | C7/2.11 | questions or elective topic |
12 | 2025-01-31 | 9:45–11:15 | HSBC, Ul. Kapelanka 42A, 30-347 Krakow | EXAM |
Lectures and Content
# | Lecture | Description | Downloads | Other Resources |
---|---|---|---|---|
1 | Agreements and Introduction program | Explain how the course will work, how we work together to the final presentations, how the scores are determined, etc. | ||
2 | The history of innovation | A historical view of banking and capitalism, the importance of exponential growth, innovations, and the great waves of capitalism. We explore the different waves and conclude that the latest wave is based on artificial intelligence, while some other promising technologies such as quantum computing, biotech and nanotech are just around the corner. This is in line with the introduction of the book "The big R-book: from data science to learning machines and big data." The homepage of the book is here | ||
3 | Getting started with R | In this module we get started using R and RStudio. This module introduces you to the language R. | This material corresponds to part II of the book "The big R-book: from data science to learning machines and big data." The homepage of the book is here. | |
4 | Importing Data in R | In this module we learn the basics of databases in general and relational databases in particular. Then we learn how to import data from SQL databases directly into R. | This material corresponds to part III of the book "The big R-book: from data science to learning machines and big data." The homepage of the book is here. | |
5 | Data Wrangling in R | When data is pulled from a database, it is seldom in the right format that would allow us to build a model right awary. In this module we learn how to manipulate data to prepare it in order to build models. This includes adding columns, calculations, insertions, normalising, working with strings, understanding dates, data binning, dealing with missing data, etc. | This material corresponds to part IV of the book "The big R-book: from data science to learning machines and big data." The homepage of the book is here. | |
6 | Building Models in R | This is where the rubber hits the road: the long preprations of importing data and preparing it for models comes to fruition now: we can start building models. We look into linear regressions, generalised linear regressions (eg. logistic regressions) and also machine learning techniques such as decision tree, random forest, support vector machines, neural networks, culstering with k-means, etc. | This material corresponds to part V of the book "The big R-book: from data science to learning machines and big data." The homepage of the book is here. | |
7 | Introduction to Companies | To be effective in a private enterprise it is useful to understand the basics of wealth creation and how that is reflecting in a balance sheet, profit and loss statement. This value creation chain leads to wealth creation in companies and hence this is a good hook to talk about company vaulation. Company valuation is an entry to financial markets with many financial instruments such as bonds, equities, options, futures, etc. | This material corresponds to part VI of the book "The big R-book: from data science to learning machines and big data." The homepage of the book is here. | |
8 | Automated Reporting in R | Even the most fantastic model or data analysis is useless if one cannot convince other people to take action. R and RMarkown provide all the tools to build an automated chain to of importing data, manipulating data, building models and reporting. We learn how to integrate code, text and layout in one document, that can be compiled to slides, or static websites. We even find out how to build an interactive application with R and {shiny}. | This material corresponds to part VII of the book "The big R-book: from data science to learning machines and big data." The homepage of the book is here. | |
9 | Bigger Data and Faster Code | Even the fastest PC cannot deal with the huge amounts of data that humanity collects. We see how one can gradually use differnt techniquest to upscale data processing capacity of the computer: using more cores in the CPU, using the GPU, faster computers all the way up to the parallelism of big data solutions such as Spark. Of course efficient programming techniques remain paramount too and also here we share many tips ranging from clean and efficint code to using compiled and C++ from within R. | This material corresponds to part IX of the book "The big R-book: from data science to learning machines and big data." The homepage of the book is here. | |
10 | Ethics | An introduction to Ethics. What is it? What is ethical and what not? How does the refernce point of view our judgement? | See references in the slides | |
11 | Bias in data | Recognising bias in data and models and building robust, unbiased models. | ||
12 | Ideas for the end-projects | The end-project is making a model, cross-validating it and reporting back. To do that, you will need data. Feel free to bring your own data to the party, but in case you struggle to find good sources, here are some ideas. |
Exam
Students form groups of 3 to 5 people and present a groupwork. The groupworks consists of
- find a problem to be solved with a model (eg. build an acceptance modeld for car insurance)
- find an appropriate dataset
- build the best possible models and compare them
- prepare a report about the work
- present the work in a short presentation