Branch-AI

6.9 KiB

Raw Blame History

Exercise 0: Environment and libraries

The exercise is validated is all questions of the exercise are validated.

Activate the virtual environment. If you used `conda` run `conda activate your_env`.

Run `python --version`.

Does it print `Python 3.x`? x >= 8

Does `import jupyter`, `import numpy` and `import pandas` run without any error?

Exercise 1: Concatenate

This question is validated if the outputted DataFrame is:

|    | letter   |   number |
|---:|:---------|---------:|
|  0 | a        |        1 |
|  1 | b        |        2 |
|  2 | c        |        1 |
|  3 | d        |        2 |

Exercise 2: Merge

The exercise is validated is all questions of the exercise are validated.

The question 1 is validated if the output is:

|    |   id | Feature1_x   | Feature2_x   | Feature1_y   | Feature2_y   |
|---:|-----:|:-------------|:-------------|:-------------|:-------------|
|  0 |    1 | A            | B            | K            | L            |
|  1 |    2 | C            | D            | M            | N            |

The question 2 is validated if the output is:

|    |   id | Feature1_df1   | Feature2_df1   | Feature1_df2   | Feature2_df2   |
|---:|-----:|:---------------|:---------------|:---------------|:---------------|
|  0 |    1 | A              | B              | K              | L              |
|  1 |    2 | C              | D              | M              | N              |
|  2 |    3 | E              | F              | nan            | nan            |
|  3 |    4 | G              | H              | nan            | nan            |
|  4 |    5 | I              | J              | nan            | nan            |
|  5 |    6 | nan            | nan            | O              | P              |
|  6 |    7 | nan            | nan            | Q              | R              |
|  7 |    8 | nan            | nan            | S              | T              |

Note: Check that the suffixes are set using the suffix parameters rather than manually changing the columns' name.

Exercise 3: Merge MultiIndex

The exercice is validated is all questions of the exercice are validated.

The question 1 is validated if the outputted DataFrame's shape is `(1305, 5)` and if `merged.head()` returns a table as below. One of the answers that returns the correct DataFrame is `market_data.merge(alternative_data, how='left', left_index=True, right_index=True)`

	Open	Close	Close_Adjusted	Twitter	Reddit
(Timestamp('2021-01-01 00:00:00', freq='B'), 'AAPL')	0.0991792	-0.31603	0.634787	-0.00159041	1.06053
(Timestamp('2021-01-01 00:00:00', freq='B'), 'FB')	-0.123753	1.00269	0.713264	0.0142127	-0.487028
(Timestamp('2021-01-01 00:00:00', freq='B'), 'GE')	-1.37775	-1.01504	1.2858	0.109835	0.04273
(Timestamp('2021-01-01 00:00:00', freq='B'), 'AMZN')	1.06324	0.841241	-0.799481	-0.805677	0.511769
(Timestamp('2021-01-01 00:00:00', freq='B'), 'DAI')	-0.603453	-2.06141	-0.969064	1.49817	0.730055

The question 2 is validated if the numbers that are missing in the DataFrame are equal to 0 and if `filled_df.sum().sum() == merged_df.sum().sum()` gives: `True`

Exercise 4: Groupby Apply

The exercise is validated is all questions of the exercise are validated and if the for loop hasn't been used. The goal is to use `groupby` and `apply`.

The question 1 is validated if the output is:

        df = pd.DataFrame(range(1,11), columns=['sequence'])
        print(winsorize(df, [0.20, 0.80]).to_markdown())

|    |   sequence |
|---:|-----------:|
|  0 |        2.8 |
|  1 |        2.8 |
|  2 |        3   |
|  3 |        4   |
|  4 |        5   |
|  5 |        6   |
|  6 |        7   |
|  7 |        8   |
|  8 |        8.2 |
|  9 |        8.2 |

The question 2 is validated if the output is a Pandas Series or DataFrame with the first 11 rows equal to the output below. The code below give a solution.

|    |   sequence |
|---:|-----------:|
|  0 |       1.45 |
|  1 |       2    |
|  2 |       3    |
|  3 |       4    |
|  4 |       5    |
|  5 |       6    |
|  6 |       7    |
|  7 |       8    |
|  8 |       9    |
|  9 |       9.55 |
| 10 |      11.45 |

    def winsorize(df_series, quantiles):
    """
        df: pd.DataFrame or pd.Series
        quantiles: list [0.05, 0.95]

    """
    min_value = np.quantile(df_series, quantiles[0])
    max_value = np.quantile(df_series, quantiles[1])

    return df_series.clip(lower = min_value, upper = max_value)


    df.groupby("group")[['sequence']].apply(winsorize, [0.05,0.95])

https://towardsdatascience.com/how-to-use-the-split-apply-combine-strategy-in-pandas-groupby-29e0eb44b62e

Exercise 5: Groupby Agg

The question is validated if the output is as below. The columns don't have to be MultiIndex. A solution could be `df.groupby('product').agg({'value':['min','max','mean']})`

product	('value', 'min')	('value', 'max')	('value', 'mean')
chair	22.89	32.12	27.505
mobile phone	100	111.22	105.61
table	20.45	99.99	51.22

Exercise 6: Unstack

The question 1 is validated if the output is similar (as the values are generated randomly, it's obvious the audit doesn't require to match the values below) to what `unstacked_df.head()`returns:

| Date                |   ('Prediction', 'AAPL') |   ('Prediction', 'AMZN') |   ('Prediction', 'DAI') |   ('Prediction', 'FB') |   ('Prediction', 'GE') |
|:--------------------|-------------------------:|-------------------------:|------------------------:|-----------------------:|-----------------------:|
| 2021-01-01 00:00:00 |                 0.382312 |                -0.072392 |               -0.551167 |             -0.0585555 |                1.05955 |
| 2021-01-04 00:00:00 |                -0.560953 |                 0.503199 |               -0.79517  |             -3.23136   |                1.50271 |
| 2021-01-05 00:00:00 |                 0.211489 |                 1.84867  |                0.287906 |             -1.81119   |                1.20321 |

6.9 KiB Raw Blame History

Exercise 0: Environment and libraries

The exercise is validated is all questions of the exercise are validated.

Activate the virtual environment. If you used conda run conda activate your_env.

Run python --version.

Does it print Python 3.x? x >= 8

Does import jupyter, import numpy and import pandas run without any error?

Exercise 1: Concatenate

This question is validated if the outputted DataFrame is:

Exercise 2: Merge

The exercise is validated is all questions of the exercise are validated.

The question 1 is validated if the output is:

The question 2 is validated if the output is:

Exercise 3: Merge MultiIndex

The exercice is validated is all questions of the exercice are validated.

The question 1 is validated if the outputted DataFrame's shape is (1305, 5) and if merged.head() returns a table as below. One of the answers that returns the correct DataFrame is market_data.merge(alternative_data, how='left', left_index=True, right_index=True)

The question 2 is validated if the numbers that are missing in the DataFrame are equal to 0 and if filled_df.sum().sum() == merged_df.sum().sum() gives: True

Exercise 4: Groupby Apply

The exercise is validated is all questions of the exercise are validated and if the for loop hasn't been used. The goal is to use groupby and apply.

The question 1 is validated if the output is:

The question 2 is validated if the output is a Pandas Series or DataFrame with the first 11 rows equal to the output below. The code below give a solution.

Exercise 5: Groupby Agg

The question is validated if the output is as below. The columns don't have to be MultiIndex. A solution could be df.groupby('product').agg({'value':['min','max','mean']})

Exercise 6: Unstack

The question 1 is validated if the output is similar (as the values are generated randomly, it's obvious the audit doesn't require to match the values below) to what unstacked_df.head()returns:

The question 2 is validated if the answer is: unstacked.plot(title = 'Stocks 2021'). The title can be anything else.

6.9 KiB

Raw Blame History

Activate the virtual environment. If you used `conda` run `conda activate your_env`.

Run `python --version`.

Does it print `Python 3.x`? x >= 8

Does `import jupyter`, `import numpy` and `import pandas` run without any error?

The question 1 is validated if the outputted DataFrame's shape is `(1305, 5)` and if `merged.head()` returns a table as below. One of the answers that returns the correct DataFrame is `market_data.merge(alternative_data, how='left', left_index=True, right_index=True)`

The question 2 is validated if the numbers that are missing in the DataFrame are equal to 0 and if `filled_df.sum().sum() == merged_df.sum().sum()` gives: `True`

The exercise is validated is all questions of the exercise are validated and if the for loop hasn't been used. The goal is to use `groupby` and `apply`.

The question is validated if the output is as below. The columns don't have to be MultiIndex. A solution could be `df.groupby('product').agg({'value':['min','max','mean']})`

The question 1 is validated if the output is similar (as the values are generated randomly, it's obvious the audit doesn't require to match the values below) to what `unstacked_df.head()`returns:

The question 2 is validated if the answer is: `unstacked.plot(title = 'Stocks 2021')`. The title can be anything else.