# Exam problem

The classifier was tested on a two class problem and achieved the following confusion matrix 
on a testing set:

    +----------------------+-----+-----+
    | True \ classified as |     |     |
    | class \              |  0  |  1  |
    +----------------------+-----+-----+
    | 0                    | 300 |   0 | 
    +----------------------+-----+-----+
    | 1                    |  80 | 120 |
    +----------------------+-----+-----+         

Calculate the following:

a) the classification accuracy

b) default accuracy (assume that the most frequent class in the testing set is also the most 
   frequent class in the training set)

c) sensitivity

d) specificity


## Classification accuracy

    (300+120)/(300+0+80+120) = 0.84
    
## Default accuracy

    The majority class is 0 
    confusion matrix (default classifier):
          0   1
        --+---+---+								 
        0 |300| 0 |
        --+---+---+
        1 |200| 0 |
        --+---+---+
    CA: 300/500 = 0.6

## Sensitivity (0 is treated as the positive class)

    Sensitivity = TP/POS = 300 / 300 = 1
    
## Specificity (0 is treated as the positive class)
    
    Specificity = TN/NEG = 120 / 200 = 0.6

# Exam problem

The classifier classified four testing instances in a 4-class problem.
The table below shows the prediction probability distribution for each of testing instances:

	             
    actual class  | predicted probs:    C1    C2    C3    C4
    --------------+-----------------------------------------
    C4            |                   0.50  0.25  0.00  0.25
    C2            |                   0.50  0.25  0.25  0.00
    C1            |                   0.75  0.00  0.25  0.00
    C2            |                   0.25  0.50  0.00  0.25


Assume that the class probability distribution in the testing data set is equal to the distribution 
in the training set. Calculate the following:

a) the average Brier score

b) the average information score
(defined as: 1/n * sum (-log2(p(prior)) + log2(p(predicted)))). p(prior) is the prior probability and
p(predicted) is the probability of the prediction for a given class.

c) the information score of a default classifier for the second testing instance


## The average Brier score

	 (
     (0.00-0.50)^2 + (0.00-0.25)^2 + (0.00-0.00)^2 + (1.00-0.25)^2 + 
     (0.00-0.50)^2 + (1.00-0.25)^2 + (0.00-0.25)^2 + (0.00-0.00)^2 + 
     (1.00-0.75)^2 + (0.00-0.00)^2 + (0.00-0.25)^2 + (0.00-0.00)^2 + 
     (0.00-0.25)^2 + (1.00-0.50)^2 + (0.00-0.00)^2 + (0.00-0.25)^2
    )/4
    = 0.5625
    
## The average information score

    prior probabilities: P(C1) = 0.25, P(C2) = 0.5, P(C3) = 0.0, P(C4) = 0.25
	
	posterior probabilities for the correct class 
                                        example 1: P'(C4)=0.25
                                        example 2: P'(C2)=0.25
                                        example 3: P'(C1)=0.75
                                        example 4: P'(C2)=0.5
                                        
    average information score: 
         (
        -log2(0.25)+log2(0.25) + 
        (-log2(0.5)+log2(0.25)) + 
        (-log2(0.25)+log2(0.75)) + 
        (-log2(0.5)+log2(0.5))
        )/4
        = 0.25
    
## The information score of a default classifier for the second testing instance

    If the implementation of a default classifier is such that it assigns an example to the majority class with probability 1.0:
    	
        -log2(0.50)+log2(1.0) = 1
        
    If the implementation of a default classifier is such that it returns for every example the prior probability distribution:
    
    	-log2(0.50)+log2(0.5) = 0

# Practical problem

- load the movies dataset

- Remove the "title" and "budget" attributes from the dataset

- split the data into two sets:
    training set, which contains information about the movies made before the year 2004
    test set, which contains information about the movies made in the year 2004 or later
    then remove the "year" attribute 

- build a decision tree to predict whether or not a movie is a comedy.

- evaluate that model on the test data


In [None]:
import pandas as pd
from sklearn import tree

Load the movies dataste

In [None]:
movies = pd.read_csv("movies.txt")

Remove the title and budget attributes from the dataset

In [None]:
movies = movies.drop("title", axis=1)
movies = movies.drop("budget", axis=1)

Split the data into two sets: training set, which contains information about the movies made before the year 2004 test set, which contains information about the movies made in the year 2004 or later then remove the "year" attribute 

In [None]:
train = movies[movies["year"] < 2004].drop("year", axis=1)
test = movies[movies["year"] >= 2004].drop("year", axis=1)

Build a decision tree to predict whether or not a movie is a comedy.

In [None]:
clf = tree.DecisionTreeClassifier(max_depth=8)

# IMPORTANT: sklearn decision trees cannot handle categorical values
# To fix this, convert the mpaa rating to dummies
train = pd.get_dummies(train, columns=["mpaa"])
test = pd.get_dummies(test, columns=["mpaa"])

clf.fit(train.drop("Comedy", axis=1), train["Comedy"])

In [None]:
preds = clf.predict(test.drop("Comedy", axis=1))

In [None]:
from sklearn import metrics

In [None]:
# Default accuracy
train["Comedy"].value_counts()
default_ca = sum(test["Comedy"] == 0) / len(test)
default_ca

In [None]:
metrics.accuracy_score(test["Comedy"], preds)

# Practical problem

- load the tic-tac-toe training and test datasets:

	- train: tic-tac-toe-learn.txt
	- test: tic-tac-toe-test.txt

- train a decision tree (the "Class" attribute is our target variable) and evaluate that model
  on the test data

- plot the ROC curve for your model (the value "positive" is our positive class) 

- try to improve your model with ROC analysis


Load the tic-tac-toe training and test datasets:

In [None]:
tic_tac_toe_train = pd.read_csv("tic-tac-toe-learn.txt")
tic_tac_toe_test = pd.read_csv("tic-tac-toe-test.txt")

In [None]:
tic_tac_toe_train

Train a decision tree (the "Class" attribute is our target variable) and evaluate that model on the test data

In [None]:
# Here, all data is categorical.
# To convert all data, we can use pd.get_dummies without the columns argument
tic_tac_toe_train = pd.get_dummies(tic_tac_toe_train)
tic_tac_toe_test = pd.get_dummies(tic_tac_toe_test)

# This also converts the class. We can just keep the positive one
tic_tac_toe_train = tic_tac_toe_train.drop("Class_negative", axis=1)
tic_tac_toe_test = tic_tac_toe_test.drop("Class_negative", axis=1)


In [None]:
tic_tac_toe_test

In [None]:
clf = tree.DecisionTreeClassifier(max_depth = 20)
clf.fit(tic_tac_toe_train.drop("Class_positive", axis=1), tic_tac_toe_train["Class_positive"])
preds = clf.predict(tic_tac_toe_test.drop("Class_positive", axis=1))

In [None]:
tic_tac_toe_test["Class_positive"].value_counts()

In [None]:
default_CA = sum(tic_tac_toe_test["Class_positive"] == 1) / len(tic_tac_toe_test)
default_CA

In [None]:
metrics.accuracy_score(tic_tac_toe_test["Class_positive"], preds)

Plot the ROC curve for your model (the value "positive" is our positive class) 

In [None]:
metrics.RocCurveDisplay.from_predictions(tic_tac_toe_test["Class_positive"], preds)

In [None]:
metrics.RocCurveDisplay.from_estimator(clf, tic_tac_toe_test.drop("Class_positive", axis=1), tic_tac_toe_test["Class_positive"])