Branch-AI

4.7 KiB

Raw Blame History

The exercice is validated is all questions of the exercice are validated

The solution of question 1 is accepted if you use `drop` with `axis=1`.`inplace=True` may be useful to avoid to affect the result to a variable. A solution that could be accepted too (even if it's not a solution I recommend is `del`.

The solution of question 2 is accepted if the DataFrame returns the output below. If the type of the index is not `dtype='datetime64[ns]'` the solution is not accepted. I recommend to use `set_index` with `inplace=True` to do so.

```python
    Input: df.head().index

    Output: 

    DatetimeIndex(['2006-12-16', '2006-12-16','2006-12-16', '2006-12-16','2006-12-16'],
    dtype='datetime64[ns]', name='Date', freq=None)
```

The solution of question 3 is accepted if all the types are `float64` as below. The preferred solution is `pd.to_numeric` with `coerce=True`.

```python
    Input: df.dtypes

    Output: 

        Global_active_power      float64
        Global_reactive_power    float64
        Voltage                  float64
        Global_intensity         float64
        Sub_metering_1           float64
        dtype: object
            
```

The solution of question 4 is accepted if you use `df.describe()`.

The solution of question 5 is accepted if you used `dropna` and have the number of missing values equal to 0.You should have noticed that 25979 rows contain missing values (for a total of 129895). `df.isna().sum()` allows to check the number of missing values and `df.dropna()` with `inplace=True` allows to remove the rows with missing values.

The solution of question 6 is accepted if one of the two approaches below were used:

```python
    #solution 1
    df.loc[:,'A'] = (df['A'] + 1) * 0.06

    #solution 2
    df.loc[:,'A'] = df.loc[:,'A'].apply(lambda x: (x+1)*0.06)
            
```


You may wonder `df.loc[:,'A']` is required and if `df['A'] = ...` works too. **The answer is no**. This is important in Pandas. Depending on the version of Pandas, it may return a warning. The reason is that you are affecting a value to a **copy** of the DataFrame and not in the DataFrame.
More details: https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas

The solution of question 7 is accepted as long as the output of `print(filtered_df.head().to_markdown())` is as below and if the number of rows is equal to 449667.

| Date                |   Global_active_power |   Global_reactive_power |
|:--------------------|----------------------:|------------------------:|
| 2008-12-27 00:00:00 |                 0.996 |                   0.066 |
| 2008-12-27 00:00:00 |                 1.076 |                   0.162 |
| 2008-12-27 00:00:00 |                 1.064 |                   0.172 |
| 2008-12-27 00:00:00 |                 1.07  |                   0.174 |
| 2008-12-27 00:00:00 |                 0.804 |                   0.184 |

The solution of question 8 is accepted if the output is

```console
    Global_active_power        0.254
    Global_reactive_power      0.000
    Voltage                  238.350
    Global_intensity           1.200
    Sub_metering_1             0.000
    Name: 2007-02-16 00:00:00, dtype: float64

```

The solution of question 9 if the output is `Timestamp('2009-02-22 00:00:00')`

The solution of question 10 if the output of `print(sorted_df.tail().to_markdown())` is

| Date                |   Global_active_power |   Global_reactive_power |   Voltage |
|:--------------------|----------------------:|------------------------:|----------:|
| 2008-08-28 00:00:00 |                 0.076 |                       0 |    234.88 |
| 2008-08-28 00:00:00 |                 0.076 |                       0 |    235.18 |
| 2008-08-28 00:00:00 |                 0.076 |                       0 |    235.4  |
| 2008-08-28 00:00:00 |                 0.076 |                       0 |    235.64 |
| 2008-12-08 00:00:00 |                 0.076 |                       0 |    236.5  |

The solution of question 11 is accepted if the output is as below. The solution is based on `groupby` which creates groups based on the index `Date` and aggregates the groups using the `mean`.

```console
Date
2006-12-16    3.053475
2006-12-17    2.354486
2006-12-18    1.530435
2006-12-19    1.157079
2006-12-20    1.545658
                ...   
2010-12-07    0.770538
2010-12-08    0.367846
2010-12-09    1.119508
2010-12-10    1.097008
2010-12-11    1.275571
Name: Global_active_power, Length: 1433, dtype: float64
```

4.7 KiB Raw Blame History

The exercice is validated is all questions of the exercice are validated

The solution of question 1 is accepted if you use drop with axis=1.inplace=True may be useful to avoid to affect the result to a variable. A solution that could be accepted too (even if it's not a solution I recommend is del.

The solution of question 2 is accepted if the DataFrame returns the output below. If the type of the index is not dtype='datetime64[ns]' the solution is not accepted. I recommend to use set_index with inplace=True to do so.

The solution of question 3 is accepted if all the types are float64 as below. The preferred solution is pd.to_numeric with coerce=True.

The solution of question 4 is accepted if you use df.describe().

The solution of question 6 is accepted if one of the two approaches below were used:

The solution of question 7 is accepted as long as the output of print(filtered_df.head().to_markdown()) is as below and if the number of rows is equal to 449667.

The solution of question 8 is accepted if the output is

The solution of question 9 if the output is Timestamp('2009-02-22 00:00:00')

The solution of question 10 if the output of print(sorted_df.tail().to_markdown()) is

The solution of question 11 is accepted if the output is as below. The solution is based on groupby which creates groups based on the index Date and aggregates the groups using the mean.

4.7 KiB

Raw Blame History

The solution of question 1 is accepted if you use `drop` with `axis=1`.`inplace=True` may be useful to avoid to affect the result to a variable. A solution that could be accepted too (even if it's not a solution I recommend is `del`.

The solution of question 2 is accepted if the DataFrame returns the output below. If the type of the index is not `dtype='datetime64[ns]'` the solution is not accepted. I recommend to use `set_index` with `inplace=True` to do so.

The solution of question 3 is accepted if all the types are `float64` as below. The preferred solution is `pd.to_numeric` with `coerce=True`.

The solution of question 4 is accepted if you use `df.describe()`.

The solution of question 7 is accepted as long as the output of `print(filtered_df.head().to_markdown())` is as below and if the number of rows is equal to 449667.

The solution of question 9 if the output is `Timestamp('2009-02-22 00:00:00')`

The solution of question 10 if the output of `print(sorted_df.tail().to_markdown())` is

The solution of question 11 is accepted if the output is as below. The solution is based on `groupby` which creates groups based on the index `Date` and aggregates the groups using the `mean`.