You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

63 lines
2.3 KiB

# W1D03 Piscine AI - Data Science
## Visualizations
While working on a dataset it is important to check the distribution of the data. Obviously, for most of humans it is difficult to visualize the data in more than 3 dimensions
"Viz" is important to understand the data and to show results. We'll discover three libraries to visualize data in Python. These are one of the most used visualisation "libraries" in Python:
- Pandas visualization module
- Matplotlib
- Plotly
The goal is to understand the basics of those libraries. You'll have time during the project to master one (or the three) of them.
You may wonder why using one library is not enough. The reason is simple: it depends on the usage.
For example if you want to check the data quickly you may want to use Pandas viz module or Matplotlib.
If you want to plot a custom and more elaborated plot I suggest to use Matplotlib or Plotly.
And, if you want to create a very nice and interactive plot I suggest to use Plotly.
## Exercises of the day
- Exercise 1 Pandas plot 1
- Exercise 2 Pandas plot 2
- Exercise 3 Matplotlib 1
- Exercise 4 Matplotlib 2
- Exercise 5 Matplotlib subplots
- Exercise 6 Plotly 1
- Exercise 7 Plotly Box plots
## Virtual Environment
- Python 3.x
- NumPy
- Pandas
- Matplotlib
- Plotly
- Jupyter or JupyterLab
I suggest to use the most recent version of the packages.
## Resources
- https://matplotlib.org/3.3.3/tutorials/index.html
- https://towardsdatascience.com/matplotlib-tutorial-learn-basics-of-pythons-powerful-plotting-library-b5d1b8f67596
- https://github.com/rougier/matplotlib-tutorial
- https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html
# Exercise 0 Environment and libraries
The goal of this exercise is to set up the Python work environment with the required libraries.
**Note:** For each quest, your first exercice will be to set up the virtual environment with the required libraries.
I recommend to use:
- the **last stable versions** of Python.
- the virtual environment you're the most confortable with. `virtualenv` and `conda` are the most used in Data Science.
- one of the most recents versions of the libraries required
1. Create a virtual environment named `ex00`, with a version of Python >= `3.8`, with the following libraries: `pandas`, `numpy`, `jupyter`, `matplotlib` and `plotly`.