You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

48 lines
1.8 KiB

# W1D02 Piscine AI - Data Science
## Pandas
The goal of this day is to understand practical usage of **Pandas**.
As **Pandas** in intensively used in Data Science, other days of the piscine will be dedicated to it.
Not only is the **Pandas** library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection.
**Pandas** is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in **Pandas**. Data in **Pandas** is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.
Most of the topics we will cover today are explained and describes with examples in the first resource. The number of exercises is low on purpose: Take the time to understand the chapter 5 of the resource, even if there are 40 pages.
## Exercises of the day
- Exercise 1 Your first DataFrame
- Exercise 2 Electric power consumption
- Exercise 3 E-commerce purchases
- Exercise 4 Handling missing values
## Virtual Environment
- Python 3.x
- NumPy
- Pandas
- Jupyter or JupyterLab
*Version of Pandas I used to do the exercises: 1.0.1*.
I suggest to use the most recent one.
## Resources
- If I had to give you one resource it would be this one:
https://bedford-computing.co.uk/learning/wp-content/uploads/2015/10/Python-for-Data-Analysis.pdf
It contains ALL you need to know about Pandas.
- Pandas documentation:
- https://pandas.pydata.org/docs/
- https://jakevdp.github.io/PythonDataScienceHandbook/
- https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
- https://www.learndatasci.com/tutorials/python-pandas-tutorial-complete-introduction-for-beginners/
- https://jakevdp.github.io/PythonDataScienceHandbook/03.04-missing-values.html