Branch-AI

5.6 KiB

Raw Blame History

Exercise 0: Environment and libraries

The exercice is validated is all questions of the exercice are validated.

Activate the virtual environment. If you used `conda` run `conda activate your_env`

Run `python --version`

Does it print `Python 3.x`? x >= 8

Does `import jupyter`, `import numpy`, `import pandas`, `import matplotlib` and `import sklearn` run without any error ?

Exercise 1: Logistic regression with Scikit-learn

The question 1 is validated if the predicted class is `0`.

The question 2 is validated if the predicted probabilities are `[0.61450526 0.38549474]`

The question 3 is validated if the output is:

Coefficient:
 [[0.81786797]]
Intercept:
 [-0.87522391]
Score:
 0.7142857142857143

Exercise 2: Sigmoid

The question 1 is validated if the plot looks like this:

Exercise 3: Decision boundary

The exercice is validated is all questions of the exercice are validated

The question 1 is validated if the outputted plot looks like this:

The question 2 is validated if the coefficient and the intercept of the Logistic Regression are:

Intercept:  [-0.98385574]
Coefficient:  [[1.18866075]]

The question 3 is validated if the plot looks like this:

The question 4 is validated if `predict_probability` outputs the same probabilities as `predict_proba`. Note that the values have to match one of the class probabilities, not both. To do so, compare the output with: `clf.predict_proba(X)[:,1]`. The shape of the arrays is not important.

The question 5 is validated if `predict_class` outputs the same classes as `cfl.predict(X)`. The shape of the arrays is not important.

The question 6 is validated if the plot looks like the plot below. As mentioned, it is not required to shift the class prediction to make the plot easier to understand.

The question 7 is validated if the plot looks like this:

Exercise 4: Train test split

The exercise is validated is all questions of the exercise are validated

The question 1 is validated if X_train, y_train, X_test, y_test match the output below. The proportion of class `1` is 0.125 in the train set and 1. in the test set.

X_train:
 [[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]
 [11 12]
 [13 14]
 [15 16]]


y_train:
 [0. 0. 0. 0. 0. 0. 0. 1.]


X_test:
 [[17 18]
 [19 20]]


y_test:
 [1. 1.]

The question 2 is validated if the proportion of class `1` is 0.3 for both sets.

Exercise 5: Breast Cancer prediction

The exercice is validated is all questions of the exercice are validated

The question 1 is validated if the proportion of class `Benign` is 0.6552217453505007. It means that if you always predict `Benign` your accuracy would be 66%.

The question 2 is validated if the proportion of one of the classes is the approximately the same on the train and test set: ~0.65. In my case:

test: 0.6571428571428571
train: 0.6547406082289803

The question 3 is validated if the output is:

# Train
Class prediction on train set:
 [4 2 4 2 2 2 2 4 2 2]

Probability prediction on train set:
 [0.99600415 0.00908666 0.99992744 0.00528803 0.02097154 0.00582772
 0.03565076 0.99515326 0.00788281 0.01065484]

Score on train set:
 0.9695885509838998

 #Test

 Class prediction on test set:
 [2 2 2 4 2 4 2 2 2 4]

Probability prediction on test set:
 [0.01747203 0.22495309 0.00698756 0.54020801 0.0015289  0.99862249
 0.33607994 0.01227679 0.00438157 0.99972344]

Score on test set:
 0.9642857142857143

Only the 10 first predictions are outputted. The score is computed on all the data in the folds. For some reasons, you may have a different data splitting as mine. The requirement for this question is to have a score on the test set bigger than 92%.

If the score is 1, congratulate you peer, he's just leaked his first target. The target should be dropped from the X_train or X_test ;) !

The question 4 is validated if the confusion matrix on the train set is similar to:

array([[357,   9],
       [  8, 185]])

and if the confusion matrix on the test set is similar to:

array([[90,  2],
       [ 3, 45]])

As said, for some reasons, the results may be slightly different from mine because of the data splitting. However, the values in the confusion matrix should be close to these results.

Exercise 6: Multi-class (Optional)

The exercice is validated is all questions of the exercice are validated

The question 1 is validated if each classifier has as input a binary data as below:

def train(X_train, y_train):
       clf = LogisticRegression()
       clf1 = LogisticRegression()
       clf2 = LogisticRegression()

       clf.fit(X_train, y_train == 0)
       clf1.fit(X_train, y_train == 1)
       clf2.fit(X_train, y_train == 2)

       return clf, clf1, clf2

The question 2 is validated if the predicted classes on the test set are:

array([0, 0, 2, 1, 2, 0, 2, 1, 1, 1, 0, 1, 2, 0, 1, 1, 0, 0, 2, 2, 0, 0,
       0, 2, 2, 2, 0, 1, 0, 0])

Even if I had this warning ConvergenceWarning: lbfgs failed to converge (status=1): I noticed that LogisticRegression returns the same output.

5.6 KiB Raw Blame History

Exercise 0: Environment and libraries

The exercice is validated is all questions of the exercice are validated.

Activate the virtual environment. If you used conda run conda activate your_env

Run python --version

Does it print Python 3.x? x >= 8

Does import jupyter, import numpy, import pandas, import matplotlib and import sklearn run without any error ?

Exercise 1: Logistic regression with Scikit-learn

The question 1 is validated if the predicted class is 0.

The question 2 is validated if the predicted probabilities are [0.61450526 0.38549474]

The question 3 is validated if the output is:

Exercise 2: Sigmoid

The question 1 is validated if the plot looks like this:

Exercise 3: Decision boundary

The exercice is validated is all questions of the exercice are validated

The question 1 is validated if the outputted plot looks like this:

The question 2 is validated if the coefficient and the intercept of the Logistic Regression are:

The question 3 is validated if the plot looks like this:

The question 4 is validated if predict_probability outputs the same probabilities as predict_proba. Note that the values have to match one of the class probabilities, not both. To do so, compare the output with: clf.predict_proba(X)[:,1]. The shape of the arrays is not important.

The question 5 is validated if predict_class outputs the same classes as cfl.predict(X). The shape of the arrays is not important.

The question 6 is validated if the plot looks like the plot below. As mentioned, it is not required to shift the class prediction to make the plot easier to understand.

The question 7 is validated if the plot looks like this:

Exercise 4: Train test split

The exercise is validated is all questions of the exercise are validated

The question 1 is validated if X_train, y_train, X_test, y_test match the output below. The proportion of class 1 is 0.125 in the train set and 1. in the test set.

The question 2 is validated if the proportion of class 1 is 0.3 for both sets.

Exercise 5: Breast Cancer prediction

The exercice is validated is all questions of the exercice are validated

The question 1 is validated if the proportion of class Benign is 0.6552217453505007. It means that if you always predict Benign your accuracy would be 66%.

The question 2 is validated if the proportion of one of the classes is the approximately the same on the train and test set: ~0.65. In my case:

The question 3 is validated if the output is:

The question 4 is validated if the confusion matrix on the train set is similar to:

Exercise 6: Multi-class (Optional)

The exercice is validated is all questions of the exercice are validated

The question 1 is validated if each classifier has as input a binary data as below:

The question 2 is validated if the predicted classes on the test set are:

5.6 KiB

Raw Blame History

Activate the virtual environment. If you used `conda` run `conda activate your_env`

Run `python --version`

Does it print `Python 3.x`? x >= 8

Does `import jupyter`, `import numpy`, `import pandas`, `import matplotlib` and `import sklearn` run without any error ?

The question 1 is validated if the predicted class is `0`.

The question 2 is validated if the predicted probabilities are `[0.61450526 0.38549474]`

The question 4 is validated if `predict_probability` outputs the same probabilities as `predict_proba`. Note that the values have to match one of the class probabilities, not both. To do so, compare the output with: `clf.predict_proba(X)[:,1]`. The shape of the arrays is not important.

The question 5 is validated if `predict_class` outputs the same classes as `cfl.predict(X)`. The shape of the arrays is not important.

The question 1 is validated if X_train, y_train, X_test, y_test match the output below. The proportion of class `1` is 0.125 in the train set and 1. in the test set.

The question 2 is validated if the proportion of class `1` is 0.3 for both sets.

The question 1 is validated if the proportion of class `Benign` is 0.6552217453505007. It means that if you always predict `Benign` your accuracy would be 66%.