1.8 KiB
W1D02 Piscine AI - Data Science
Pandas
The goal of this day is to understand practical usage of Pandas. As Pandas in intensively used in Data Science, other days of the piscine will be dedicated to it.
Not only is the Pandas library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection.
Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in Pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.
Most of the topics we will cover today are explained and describes with examples in the first resource. The number of exercises is low on purpose: Take the time to understand the chapter 5 of the resource, even if there are 40 pages.
Exercises of the day
- Exercise 1 Your first DataFrame
- Exercise 2 Electric power consumption
- Exercise 3 E-commerce purchases
- Exercise 4 Handling missing values
Virtual Environment
- Python 3.x
- NumPy
- Pandas
- Jupyter or JupyterLab
Version of Pandas I used to do the exercises: 1.0.1. I suggest to use the most recent one.
Resources
- If I had to give you one resource it would be this one:
https://bedford-computing.co.uk/learning/wp-content/uploads/2015/10/Python-for-Data-Analysis.pdf
It contains ALL you need to know about Pandas.