Regularization Experiment
Zahteve zaključka
Prepare the following workflow.
- Load the data on heart disease
- Feed the data to Test and Score
We shall now test the effects of tree prunning.
Add four Tree widgets and connect them to Test and Score. These will not need data because they will provide a learning algorithm ("a recipe") and not a model to Test and Score. Set the Tree widgets as follows:
- In one, disable prunning by unchecking all check boxes
- In the next one, use defaults (2, 5, 100, 95; all checked)
- In the next, set the first value to 15, use defaults for the rest
- In the last, set the first value to 50, use defaulrs for the rest.
Set the Test and Score to Leave one out (because why not, though it will take a while). Which tree performed best?
You may want to switch Test and Score to Cross Validation for this one.
Replace the Trees with three instances of logistic regression.
- One with weak regularization (all the way to the left)
- One in the middle (C=1)
- One with strong regularization.
Add a naive Bayesian classifier, for good measure.
Which model performs best?
Note that these results cannot be generalized. There is not optimal setting, it all depends on the data.