You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.8 KiB

W1D02 Piscine AI - Data Science

Pandas

The goal of this day is to understand practical usage of Pandas. As Pandas in intensively used in Data Science, other days of the piscine will be dedicated to it.

Not only is the Pandas library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection.

Pandas is built on top of the NumPy package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in Pandas is often used to feed statistical analysis in SciPy, plotting functions from Matplotlib, and machine learning algorithms in Scikit-learn.

Most of the topics we will cover today are explained and describes with examples in the first resource. The number of exercises is low on purpose: Take the time to understand the chapter 5 of the resource, even if there are 40 pages.

Exercises of the day

  • Exercise 1 Your first DataFrame
  • Exercise 2 Electric power consumption
  • Exercise 3 E-commerce purchases
  • Exercise 4 Handling missing values

Virtual Environment

  • Python 3.x
  • NumPy
  • Pandas
  • Jupyter or JupyterLab

Version of Pandas I used to do the exercises: 1.0.1. I suggest to use the most recent one.

Resources

  • If I had to give you one resource it would be this one:

https://bedford-computing.co.uk/learning/wp-content/uploads/2015/10/Python-for-Data-Analysis.pdf

It contains ALL you need to know about Pandas.