# W1D02 Piscine AI - Data Science ## Pandas The goal of this day is to understand practical usage of **Pandas**. As **Pandas** in intensively used in Data Science, other days of the piscine will be dedicated to it. Not only is the **Pandas** library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection. **Pandas** is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in **Pandas**. Data in **Pandas** is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn. Most of the topics we will cover today are explained and describes with examples in the first resource. The number of exercises is low on purpose: Take the time to understand the chapter 5 of the resource, even if there are 40 pages. ## Exercises of the day - Exercice 0 Environment and libraries - Exercise 1 Your first DataFrame - Exercise 2 Electric power consumption - Exercise 3 E-commerce purchases - Exercise 4 Handling missing values ## Virtual Environment - Python 3.x - NumPy - Pandas - Jupyter or JupyterLab *Version of Pandas I used to do the exercises: 1.0.1*. I suggest to use the most recent one. ## Resources - If I had to give you one resource it would be this one: https://bedford-computing.co.uk/learning/wp-content/uploads/2015/10/Python-for-Data-Analysis.pdf It contains ALL you need to know about Pandas. - Pandas documentation: - https://pandas.pydata.org/docs/ - https://jakevdp.github.io/PythonDataScienceHandbook/ - https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf - https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/ - https://jakevdp.github.io/PythonDataScienceHandbook/03.04-missing-values.html # Exercise 0 Environment and libraries The goal of this exercise is to set up the Python work environment with the required libraries. **Note:** For each quest, your first exercice will be to set up the virtual environment with the required libraries. I recommend to use: - the **last stable versions** of Python. - the virtual environment you're the most confortable with. `virtualenv` and `conda` are the most used in Data Science. - one of the most recents versions of the libraries required 1. Create a virtual environment named `ex00`, with a version of Python >= `3.8`, with the following libraries: `pandas`, `numpy` and `jupyter`.