How to become a data scientist

Monday, January 04, 2021 » posts.tags:

Maybe you have read my latest book about data science or you have seen my courses about data science on my YouTube Channel, and it appeals to you to become a data scientist yourself. Then this post is for you: I will share some ideas about what makes a good data scientist and how you can improve your qualities.

Data Scientists are in demand, much of the new jobs and much of the new economy seems to be related to data science. It seems as if data science and its various sub-fields such as deep learning, big data, statistics, modelling, etc. are supporting a new wave of development and wealth creation.

Data science is a worthwhile pursuit, and is a road to a rich and interesting career with amazing growth opportunities. It is, however, not the most obvious thing to learn. That is because data science does not a narrow scientifc field. Data science is the art and science of making (business) decisions based on data, hence one need computer skills to access and manipulate the data, business insight to know what data and how to manipulate it as well as the insight what analytics make sense, mathematical knowledge to run the model, and communication skills to convince other people about the results.

Once can say that data science as the art and science to use data to create actionable insights. As such, data science is the interdisciplinary scientific domain that lies at the intersection between mathematics/statistics, computer science, business management, and psychology … all this within an ever changing landscape.1

In order to be succesful in data science, you will need to have a basic understanding of all of the sub-domains, excel in a few and make this your personal trademark. These sub-domains break down as follows:

  1. Mathematics/statistics: a good understanding of basic notions is maybe more important than a superficial knowledge of more advanced techniques.2 – The best ways to build up this mathematical knowledge is via an academic training (STEM subjects are obvious candidates and are the ideal starting point – it is rather hard to acquire this knowledge later if you don’t have a solid base).
  2. Computers science: understanding databases, software design, OO, and being able to write code in SQL, R and Python is essential and should be considered as minimal requirements.3 Programming is best learned on-the-job: simply start a project, find solutions to all your problems and don’t give up. You might also find my movies on YouTube useful.
  3. Business Management: understanding the business model in your sector is essential to produce analysis that make sense and can be used by the management. A data scientist has a purpose and a mission, you need to understand that mission and always keep an eye on the bigger picture. This business insights is something that can be acquired by reading about the subject, studying applied economics, but years of experience will always add value.
  4. Psychology, Game Theory and Decision Science: your job is not done by crunching numbers: you will need to present them, suggest solutions, etc. … and most of that is only possible via solid teamwork. So, you have to be able to work as a team, and making sure that you produce the most impactful analysis possible that is digestable and actionable. Reading up about multi criteria decision analysis or game theory might tick certain boxes, but beging effective is something that you will need to work on permanently. For example getting and giving feedback will help you grow.
  5. An Open mind and willingness to learn: the domain of data science is an ever moving landscape, some skills that are explained above only help you to move towards the top, but being a data scientist means that you need to be willing to learn.

So, you will want to start by studying something that is of interest and can be used as an angle of approach to data science. Any STEM subject will do, as well as econometrics, applied economy and business analytics.

Then you need to read up about the fields that were less prominent in that study. I have tried to find a book that could help you on that journey, but could not find one … so I wrote it myself The Big R-Book: From Data Science to Learning Machines and Big Data. The book is about data, and how to use it successfully in a private company. The book is not only a technical work, it aims at increasing your personal brand and value, as well as shareholder value. You can learn about statistical models, muliti-criteria decision analysis, machine learning, artificial intelligence, big data, creating interactive websites, automated presentations, speed up code, program the GPU, etc. all while learning to code in R. The book takes a pragmatic stance and gets you started. It covers the whole dat cycle from data bases, importing data, wrangling data, to modelling, reporting and even elaborates on big data, interactive websites, etc.

Whatever route you choose to start, I’m sure that you are in for a fantastic journey in a fast growing field. I wish you loads of success and a rewarding career!

Footnotes

  1. Wikipedia defines data science as “Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, machine learning and big data.” 

  2. For example it is more important to know the limitations of “correlation”, rather than being able to clasify that it is actually the Pearson correlation and ther others are for example the Spearman correlation, Kendall rank correlation, etc. 

  3. If you are young and have more time, it is a good idea to start with C++. This will give you a deeper understanding of how computers work and you will be learning a tool that will make it easier to learn the more modern computer languages. It is also the languages in which R or Python are written, it runs faster and it can be used within the forementioned languages to speed up code.