Branch-AI/piscine/week01/day02/ex02/audit/README.md

##### The exercice is validated is all questions of the exercice are validated

##### The solution of question 1 is accepted if you use `drop` with `axis=1`.`inplace=True` may be useful to avoid to affect the result to a variable. A solution that could be accepted too (even if it's not a solution I recommend is `del`. 

##### The solution of question 2 is accepted if the DataFrame returns the output below. If the type of the index is not `dtype='datetime64[ns]'` the solution is not accepted. I recommend to use  `set_index` with `inplace=True` to do so. 

    ```python
        Input: df.head().index

        Output: 

        DatetimeIndex(['2006-12-16', '2006-12-16','2006-12-16', '2006-12-16','2006-12-16'],
        dtype='datetime64[ns]', name='Date', freq=None)
    ```

##### The solution of question 3 is accepted if all the types are `float64` as below. The preferred solution is `pd.to_numeric` with `coerce=True`. 

    ```python
        Input: df.dtypes

        Output: 

            Global_active_power      float64
            Global_reactive_power    float64
            Voltage                  float64
            Global_intensity         float64
            Sub_metering_1           float64
            dtype: object
                
    ```

##### The solution of question 4 is accepted if you use `df.describe()`.

##### The solution of question 5 is accepted if you used `dropna` and have the number of missing values equal to 0.You should have noticed that 25979 rows contain missing values (for a total of 129895). `df.isna().sum()` allows to check the number of missing values and `df.dropna()` with `inplace=True` allows to remove the rows with missing values. 

##### The solution of question 6 is accepted if one of the two approaches below were used:

    ```python
        #solution 1
        df.loc[:,'A'] = (df['A'] + 1) * 0.06

        #solution 2
        df.loc[:,'A'] = df.loc[:,'A'].apply(lambda x: (x+1)*0.06)
                
    ```


    You may wonder `df.loc[:,'A']` is required and if `df['A'] = ...` works too. **The answer is no**. This is important in Pandas. Depending on the version of Pandas, it may return a warning. The reason is that you are affecting a value to a **copy** of the DataFrame and not in the DataFrame.
    More details: https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas

##### The solution of question 7 is accepted as long as the output of `print(filtered_df.head().to_markdown())` is as below and if the number of rows is equal to **449667**.

    | Date                |   Global_active_power |   Global_reactive_power |
    |:--------------------|----------------------:|------------------------:|
    | 2008-12-27 00:00:00 |                 0.996 |                   0.066 |
    | 2008-12-27 00:00:00 |                 1.076 |                   0.162 |
    | 2008-12-27 00:00:00 |                 1.064 |                   0.172 |
    | 2008-12-27 00:00:00 |                 1.07  |                   0.174 |
    | 2008-12-27 00:00:00 |                 0.804 |                   0.184 |

##### The solution of question 8 is accepted if the output is

    ```console
        Global_active_power        0.254
        Global_reactive_power      0.000
        Voltage                  238.350
        Global_intensity           1.200
        Sub_metering_1             0.000
        Name: 2007-02-16 00:00:00, dtype: float64

    ```

##### The solution of question 9 if the output is `Timestamp('2009-02-22 00:00:00')`

##### The solution of question 10 if the output of `print(sorted_df.tail().to_markdown())` is

    | Date                |   Global_active_power |   Global_reactive_power |   Voltage |
    |:--------------------|----------------------:|------------------------:|----------:|
    | 2008-08-28 00:00:00 |                 0.076 |                       0 |    234.88 |
    | 2008-08-28 00:00:00 |                 0.076 |                       0 |    235.18 |
    | 2008-08-28 00:00:00 |                 0.076 |                       0 |    235.4  |
    | 2008-08-28 00:00:00 |                 0.076 |                       0 |    235.64 |
    | 2008-12-08 00:00:00 |                 0.076 |                       0 |    236.5  |

##### The solution of question 11 is accepted if the output is as below. The solution is based on `groupby` which creates groups based on the index `Date` and aggregates the groups using the `mean`.

    ```console
    Date
    2006-12-16    3.053475
    2006-12-17    2.354486
    2006-12-18    1.530435
    2006-12-19    1.157079
    2006-12-20    1.545658
                    ...   
    2010-12-07    0.770538
    2010-12-08    0.367846
    2010-12-09    1.119508
    2010-12-10    1.097008
    2010-12-11    1.275571
    Name: Global_active_power, Length: 1433, dtype: float64
    ```
feat: clean folders 2 years ago			`##### The exercice is validated is all questions of the exercice are validated`

			##### The solution of question 1 is accepted if you use `drop` with `axis=1`.`inplace=True` may be useful to avoid to affect the result to a variable. A solution that could be accepted too (even if it's not a solution I recommend is `del`.

			##### The solution of question 2 is accepted if the DataFrame returns the output below. If the type of the index is not `dtype='datetime64[ns]'` the solution is not accepted. I recommend to use `set_index` with `inplace=True` to do so.

			```python
			`Input: df.head().index`

			`Output:`

			`DatetimeIndex(['2006-12-16', '2006-12-16','2006-12-16', '2006-12-16','2006-12-16'],`
			`dtype='datetime64[ns]', name='Date', freq=None)`
			```

			##### The solution of question 3 is accepted if all the types are `float64` as below. The preferred solution is `pd.to_numeric` with `coerce=True`.

			```python
			`Input: df.dtypes`

			`Output:`

			`Global_active_power float64`
			`Global_reactive_power float64`
			`Voltage float64`
			`Global_intensity float64`
			`Sub_metering_1 float64`
			`dtype: object`

			```

			##### The solution of question 4 is accepted if you use `df.describe()`.

			##### The solution of question 5 is accepted if you used `dropna` and have the number of missing values equal to 0.You should have noticed that 25979 rows contain missing values (for a total of 129895). `df.isna().sum()` allows to check the number of missing values and `df.dropna()` with `inplace=True` allows to remove the rows with missing values.

			`##### The solution of question 6 is accepted if one of the two approaches below were used:`

			```python
			`#solution 1`
			`df.loc[:,'A'] = (df['A'] + 1) * 0.06`

			`#solution 2`
			`df.loc[:,'A'] = df.loc[:,'A'].apply(lambda x: (x+1)*0.06)`

			```


			You may wonder `df.loc[:,'A']` is required and if `df['A'] = ...` works too. The answer is no. This is important in Pandas. Depending on the version of Pandas, it may return a warning. The reason is that you are affecting a value to a copy of the DataFrame and not in the DataFrame.
			`More details: https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas`

			##### The solution of question 7 is accepted as long as the output of `print(filtered_df.head().to_markdown())` is as below and if the number of rows is equal to 449667.

			`\| Date \| Global_active_power \| Global_reactive_power \|`
			`\|:--------------------\|----------------------:\|------------------------:\|`
			`\| 2008-12-27 00:00:00 \| 0.996 \| 0.066 \|`
			`\| 2008-12-27 00:00:00 \| 1.076 \| 0.162 \|`
			`\| 2008-12-27 00:00:00 \| 1.064 \| 0.172 \|`
			`\| 2008-12-27 00:00:00 \| 1.07 \| 0.174 \|`
			`\| 2008-12-27 00:00:00 \| 0.804 \| 0.184 \|`

			`##### The solution of question 8 is accepted if the output is`

			```console
			`Global_active_power 0.254`
			`Global_reactive_power 0.000`
			`Voltage 238.350`
			`Global_intensity 1.200`
			`Sub_metering_1 0.000`
			`Name: 2007-02-16 00:00:00, dtype: float64`

			```

			##### The solution of question 9 if the output is `Timestamp('2009-02-22 00:00:00')`

			##### The solution of question 10 if the output of `print(sorted_df.tail().to_markdown())` is

			`\| Date \| Global_active_power \| Global_reactive_power \| Voltage \|`
			`\|:--------------------\|----------------------:\|------------------------:\|----------:\|`
			`\| 2008-08-28 00:00:00 \| 0.076 \| 0 \| 234.88 \|`
			`\| 2008-08-28 00:00:00 \| 0.076 \| 0 \| 235.18 \|`
			`\| 2008-08-28 00:00:00 \| 0.076 \| 0 \| 235.4 \|`
			`\| 2008-08-28 00:00:00 \| 0.076 \| 0 \| 235.64 \|`
			`\| 2008-12-08 00:00:00 \| 0.076 \| 0 \| 236.5 \|`

			##### The solution of question 11 is accepted if the output is as below. The solution is based on `groupby` which creates groups based on the index `Date` and aggregates the groups using the `mean`.

			```console
			`Date`
			`2006-12-16 3.053475`
			`2006-12-17 2.354486`
			`2006-12-18 1.530435`
			`2006-12-19 1.157079`
			`2006-12-20 1.545658`
			`...`
			`2010-12-07 0.770538`
			`2010-12-08 0.367846`
			`2010-12-09 1.119508`
			`2010-12-10 1.097008`
			`2010-12-11 1.275571`
			`Name: Global_active_power, Length: 1433, dtype: float64`
			```