Member-only story
What are they? Pandas, MatPlotLib, Scikit-Learn, Jupyter Notebook... Python Libraries
When you first go into data science, you will hear terminologies or words like Pandas, MatPlotLib, Scikit-Learn, Jupyter Notebook... What are they? They are confusing.
This article will describe on what is Pandas, MatPlotLib, Scikit-Learn, Jupyter Notebook and what are their differences.
For Data Mining process, we usually use CRISP DM data mining process:
Based on: https://www.datascience-pm.com/crisp-dm-2/
Data Mining process steps includes Business Understanding, Data Understanding, Data Preparation, Modeling, Evaluation, Deployment.
- Business Understanding step - we need to understand the business and establish the question we need to answer for the data mining
- Data Understanding step - we need to understand the data. We can use statistics such as descriptive, regression analysis to understand the data.
- Data Preparation step - it is the cleaning of the data and we can remove duplicates here.
- Modeling step - we create clustering models, prediction models, classification models.
- Evaluation step - we evaluate which models is more accurate and select.
- Deployment steps - we can create data…