mirror of https://github.com/01-edu/Branch-AI.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
101 lines
4.7 KiB
101 lines
4.7 KiB
2 years ago
|
##### The exercice is validated is all questions of the exercice are validated
|
||
|
|
||
|
##### The solution of question 1 is accepted if you use `drop` with `axis=1`.`inplace=True` may be useful to avoid to affect the result to a variable. A solution that could be accepted too (even if it's not a solution I recommend is `del`.
|
||
|
|
||
|
##### The solution of question 2 is accepted if the DataFrame returns the output below. If the type of the index is not `dtype='datetime64[ns]'` the solution is not accepted. I recommend to use `set_index` with `inplace=True` to do so.
|
||
|
|
||
|
```python
|
||
|
Input: df.head().index
|
||
|
|
||
|
Output:
|
||
|
|
||
|
DatetimeIndex(['2006-12-16', '2006-12-16','2006-12-16', '2006-12-16','2006-12-16'],
|
||
|
dtype='datetime64[ns]', name='Date', freq=None)
|
||
|
```
|
||
|
|
||
|
##### The solution of question 3 is accepted if all the types are `float64` as below. The preferred solution is `pd.to_numeric` with `coerce=True`.
|
||
|
|
||
|
```python
|
||
|
Input: df.dtypes
|
||
|
|
||
|
Output:
|
||
|
|
||
|
Global_active_power float64
|
||
|
Global_reactive_power float64
|
||
|
Voltage float64
|
||
|
Global_intensity float64
|
||
|
Sub_metering_1 float64
|
||
|
dtype: object
|
||
|
|
||
|
```
|
||
|
|
||
|
##### The solution of question 4 is accepted if you use `df.describe()`.
|
||
|
|
||
|
##### The solution of question 5 is accepted if you used `dropna` and have the number of missing values equal to 0.You should have noticed that 25979 rows contain missing values (for a total of 129895). `df.isna().sum()` allows to check the number of missing values and `df.dropna()` with `inplace=True` allows to remove the rows with missing values.
|
||
|
|
||
|
##### The solution of question 6 is accepted if one of the two approaches below were used:
|
||
|
|
||
|
```python
|
||
|
#solution 1
|
||
|
df.loc[:,'A'] = (df['A'] + 1) * 0.06
|
||
|
|
||
|
#solution 2
|
||
|
df.loc[:,'A'] = df.loc[:,'A'].apply(lambda x: (x+1)*0.06)
|
||
|
|
||
|
```
|
||
|
|
||
|
|
||
|
You may wonder `df.loc[:,'A']` is required and if `df['A'] = ...` works too. **The answer is no**. This is important in Pandas. Depending on the version of Pandas, it may return a warning. The reason is that you are affecting a value to a **copy** of the DataFrame and not in the DataFrame.
|
||
|
More details: https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas
|
||
|
|
||
|
##### The solution of question 7 is accepted as long as the output of `print(filtered_df.head().to_markdown())` is as below and if the number of rows is equal to **449667**.
|
||
|
|
||
|
| Date | Global_active_power | Global_reactive_power |
|
||
|
|:--------------------|----------------------:|------------------------:|
|
||
|
| 2008-12-27 00:00:00 | 0.996 | 0.066 |
|
||
|
| 2008-12-27 00:00:00 | 1.076 | 0.162 |
|
||
|
| 2008-12-27 00:00:00 | 1.064 | 0.172 |
|
||
|
| 2008-12-27 00:00:00 | 1.07 | 0.174 |
|
||
|
| 2008-12-27 00:00:00 | 0.804 | 0.184 |
|
||
|
|
||
|
##### The solution of question 8 is accepted if the output is
|
||
|
|
||
|
```console
|
||
|
Global_active_power 0.254
|
||
|
Global_reactive_power 0.000
|
||
|
Voltage 238.350
|
||
|
Global_intensity 1.200
|
||
|
Sub_metering_1 0.000
|
||
|
Name: 2007-02-16 00:00:00, dtype: float64
|
||
|
|
||
|
```
|
||
|
|
||
|
##### The solution of question 9 if the output is `Timestamp('2009-02-22 00:00:00')`
|
||
|
|
||
|
##### The solution of question 10 if the output of `print(sorted_df.tail().to_markdown())` is
|
||
|
|
||
|
| Date | Global_active_power | Global_reactive_power | Voltage |
|
||
|
|:--------------------|----------------------:|------------------------:|----------:|
|
||
|
| 2008-08-28 00:00:00 | 0.076 | 0 | 234.88 |
|
||
|
| 2008-08-28 00:00:00 | 0.076 | 0 | 235.18 |
|
||
|
| 2008-08-28 00:00:00 | 0.076 | 0 | 235.4 |
|
||
|
| 2008-08-28 00:00:00 | 0.076 | 0 | 235.64 |
|
||
|
| 2008-12-08 00:00:00 | 0.076 | 0 | 236.5 |
|
||
|
|
||
|
##### The solution of question 11 is accepted if the output is as below. The solution is based on `groupby` which creates groups based on the index `Date` and aggregates the groups using the `mean`.
|
||
|
|
||
|
```console
|
||
|
Date
|
||
|
2006-12-16 3.053475
|
||
|
2006-12-17 2.354486
|
||
|
2006-12-18 1.530435
|
||
|
2006-12-19 1.157079
|
||
|
2006-12-20 1.545658
|
||
|
...
|
||
|
2010-12-07 0.770538
|
||
|
2010-12-08 0.367846
|
||
|
2010-12-09 1.119508
|
||
|
2010-12-10 1.097008
|
||
|
2010-12-11 1.275571
|
||
|
Name: Global_active_power, Length: 1433, dtype: float64
|
||
|
```
|