Eric Goh Ming Hui
5 min readMar 29, 2023

--

FREE Tutorials, Certificates, Answers to Data Science Questions for Beginners.

According to Indian Institute of Technology, IIT Madras, Data Science is a blend of Computer Science, Statistics, and Business.

Extracted from: https://ge.iitm.ac.in/I2MP/data-science/

Data science combines computer science, statistics, business to uncover insights in data. These insights can be used in decision making. In data science, we usually create application or softwares. You can know more about data science, data mining, text mining, data analysis and big data at:
http://edatascience.great-site.net/2023/03/07/what-are-they-data-science-data-mining-text-mining-data-analysis-big-data/

Why Data Science?
Have you ever wonder how your mobile phone suggests you the news to read? How can YouTube use the data in your mobile phone to suggest videos for you to watch? How can Lazada store suggests you the products? All these are data science at work in real time. Prediction models and classifications models trained using your mobile phone data is used to predict the news, videos and products.

According to Harvard Business Review (October 2012 edition), job of a data scientist is the sexiest job of 21st century.

There is a shortage of Data Scientist in the World.
According to the McKinsey Global Institute (In a May 2011 report): “By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.”

What is Data Mining and Data Analysis?
What is CRISP DM? For Data Mining process, we usually use CRISP DM data mining process:

Based on: https://www.datascience-pm.com/crisp-dm-2/

Data Mining process steps includes Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment.

Business Understanding step – we need to understand the business and establish the question we need to answer for the data mining

Data Understanding step – we need to understand the data. We can use statistics such as descriptive, regression analysis to understand the data.

Data Preparation step – it is the cleaning of the data and we can remove duplicates here.

Modeling step – we create clustering models, prediction models, classification models.

Evaluation step – we evaluate which models is more accurate and select.

Deployment steps – we can create data products.

For Data Science, at the Deployment steps, we create data products for businesses. We can create softwares that predicts something. In the table, the goal is to build data products for a business.

For Data Mining, at the Deployment steps, we create reports or PowerPoints slides on our results. In the table, the goal is to extracting important information.

Read More on CRISP DM: http://edatascience.great-site.net/2023/03/08/what-is-crisp-dm/

Data Analysis Process

What is Data Analysis Process? The following is the summary of Data Analysis process:

Identify – You need to identify why you need data analysis in the first place?
Collect – As the name suggest, this is the step you collect data.

Analyse – To analyze data, you can use descriptive statistics (mean, median, …), inferential statistics, data visualizations (charts) to analyze the data.
Interpret – It is time to interpret your results. Write report.

Read more: http://edatascience.great-site.net/2023/03/09/what-is-data-analysis-process/

Questions You may ask when you First Step into Data Science.

The following are some questions you may want to ask:

What are they? Data Science, Data Mining, Data Analysis, Big Data….
https://gohminghui88.medium.com/what-are-they-data-science-data-mining-text-mining-data-analysis-big-data-dc3f7db46f8

What is CRISP DM?
https://gohminghui88.medium.com/what-is-crisp-dm-d30416733019

What is Data Analysis Process?
https://gohminghui88.medium.com/what-is-data-analysis-process-84864779eb5

How to start a career in Data Science? Data Scientist, Data Analyst, Data Engineer, Machine Learning Engineer. What are they?
https://gohminghui88.medium.com/how-to-start-a-career-in-data-science-a7ccca6c075

Attributes, Variables, Features, Columns, Observations, Rows, Dependable Variables, Independent Variables. What are they?
https://gohminghui88.medium.com/attributes-variables-features-columns-observations-rows-dependent-variables-independent-82bf82ca195e

Categorical Variables and Numerical Variables. What are they?
https://gohminghui88.medium.com/categorical-variables-and-numerical-variables-what-are-they-d1f2cfe3b02e

Modeling and Evaluation: Explain Regression or Prediction and Classification using Simple Linear Regression using y = mx +c.
https://gohminghui88.medium.com/modeling-explain-using-simple-linear-regression-y-mx-c-many-people-explain-prediction-166a4c4f68b4

What is the difference between prediction, classification, clustering?
https://gohminghui88.medium.com/what-is-the-difference-between-prediction-classification-clustering-6946cd63cab2

Data Understanding and Data Analysis. What’s the Difference?
https://gohminghui88.medium.com/data-understanding-and-data-analysis-whats-the-difference-6ab2bc0cc96f

What are they? Dplyr, ggplot2, caret, RMarkDown… R Libraries
https://gohminghui88.medium.com/what-are-they-dplyr-ggplot2-caret-rmarkdown-r-libraries-84e3b00718f3

What are they? Pandas, MatPlotLib, Scikit Learn, Jupyter Notebook… Python Libraries.
https://gohminghui88.medium.com/what-are-they-pandas-matplotlib-scikit-learn-jupyter-notebook-python-libraries-6e75f9703436

Top 10 Data Science Tools

http://edatascience.great-site.net/2023/03/29/top-10-data-science-tools/

8 Programming Languages for Data Science

https://gohminghui88.medium.com/8-programming-languages-for-data-science-what-is-data-science-and-data-mining-e9cd63f05149

Data Certificates that is Low Cost

https://gohminghui88.medium.com/data-science-certificates-that-is-low-cost-before-we-look-at-the-certifications-we-look-at-eb2e662482b8

10 Data Science Books for Beginner

https://gohminghui88.medium.com/10-data-science-book-for-beginner-34d343f7d745

10 Data Science Certificate for Beginner

https://gohminghui88.medium.com/10-data-science-certificate-for-beginner-d88ad60089ba

If you want to learn more about data science, you can go to http://svbook.great-site.net/?i=1

SVBook Pte. Ltd. assists people to know What is Data Science, Text Analysis, Text Mining, Text Analytics, AI and Machine Learning, process of data, Data Mining to Insights Process using CRISP DM. Data Mining Process Steps includes Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment.
DSTK.Tech (https://dstk2.sourceforge.io/) creates tools and technologies for Data Science, develops open source tools for data science.

EMHAcademy (http://emhacademy.great-site.net/) offers courses to help people become Certified Data Scientist.

Thanks for readings.

Kind Regards,
Eric Goh

Buy me a cup of coffee:https://www.buymeacoffee.com/gohminghui

References:

https://aws.amazon.com/what-is/data-science/

--

--

Eric Goh Ming Hui

(G.Dip, M.Tech, eMBA) | Author of "Learn R for Applied Statistics" | Founder of SVBook Pte. Ltd. : http://svbook.great-site.net