mirror of https://github.com/01-edu/Branch-AI.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
63 lines
2.3 KiB
63 lines
2.3 KiB
2 years ago
|
# W1D03 Piscine AI - Data Science
|
||
|
|
||
|
## Visualizations
|
||
|
|
||
|
While working on a dataset it is important to check the distribution of the data. Obviously, for most of humans it is difficult to visualize the data in more than 3 dimensions
|
||
|
|
||
|
"Viz" is important to understand the data and to show results. We'll discover three libraries to visualize data in Python. These are one of the most used visualisation "libraries" in Python:
|
||
|
|
||
|
- Pandas visualization module
|
||
|
- Matplotlib
|
||
|
- Plotly
|
||
|
|
||
|
The goal is to understand the basics of those libraries. You'll have time during the project to master one (or the three) of them.
|
||
|
You may wonder why using one library is not enough. The reason is simple: it depends on the usage.
|
||
|
For example if you want to check the data quickly you may want to use Pandas viz module or Matplotlib.
|
||
|
If you want to plot a custom and more elaborated plot I suggest to use Matplotlib or Plotly.
|
||
|
And, if you want to create a very nice and interactive plot I suggest to use Plotly.
|
||
|
|
||
|
|
||
|
## Exercises of the day
|
||
|
|
||
|
- Exercise 1 Pandas plot 1
|
||
|
- Exercise 2 Pandas plot 2
|
||
|
- Exercise 3 Matplotlib 1
|
||
|
- Exercise 4 Matplotlib 2
|
||
|
- Exercise 5 Matplotlib subplots
|
||
|
- Exercise 6 Plotly 1
|
||
|
- Exercise 7 Plotly Box plots
|
||
|
|
||
|
|
||
|
## Virtual Environment
|
||
|
- Python 3.x
|
||
|
- NumPy
|
||
|
- Pandas
|
||
|
- Matplotlib
|
||
|
- Plotly
|
||
|
- Jupyter or JupyterLab
|
||
|
|
||
|
I suggest to use the most recent version of the packages.
|
||
|
|
||
|
## Resources
|
||
|
|
||
|
- https://matplotlib.org/3.3.3/tutorials/index.html
|
||
|
- https://towardsdatascience.com/matplotlib-tutorial-learn-basics-of-pythons-powerful-plotting-library-b5d1b8f67596
|
||
|
|
||
|
- https://github.com/rougier/matplotlib-tutorial
|
||
|
- https://jakevdp.github.io/PythonDataScienceHandbook/05.13-kernel-density-estimation.html
|
||
|
|
||
|
|
||
|
# Exercise 0 Environment and libraries
|
||
|
|
||
|
The goal of this exercise is to set up the Python work environment with the required libraries.
|
||
|
|
||
|
**Note:** For each quest, your first exercice will be to set up the virtual environment with the required libraries.
|
||
|
|
||
|
I recommend to use:
|
||
|
|
||
|
- the **last stable versions** of Python.
|
||
|
- the virtual environment you're the most confortable with. `virtualenv` and `conda` are the most used in Data Science.
|
||
|
- one of the most recents versions of the libraries required
|
||
|
|
||
|
1. Create a virtual environment named `ex00`, with a version of Python >= `3.8`, with the following libraries: `pandas`, `numpy`, `jupyter`, `matplotlib` and `plotly`.
|