To find the most informative variable in a Scatter Plot, use the Find informative projections button. This estimates R2 score of kNN regressor (k=10) and return the pair of variables with the highest score.

Scatter plot finds displacement and model year as two most informative features. For those of you, who are not auto enthusiasts, displacement is roughly said a measure of engine size. Displacement and model year seem to be roughly negatively correlated. On average, newer cars have smaller engines, but it is difficult to gauge correlation between a categorical and numeric variable.

When looking at mpg, which stands for miles per gallon, we see that newer cars also go further (make more miles with the same amount of gasoline). But most importantly, smaller engines go further, too.

Now for plotting the error. This is how the workflow should look like:

In Feature Constructor, create a new numeric variable, give it a name and subtract mpg from Linear_Regression predictions. In Scatter plot, color the points by the new error variable.

The plot now shows which car had the poorest predictions. Blue points show cars that had an pessimistic predictions (prediction was lower than the actual value), while yellow points show cars that had optimistic predictions (prediction was higher than the actual value). It seems like newer cars generally have higher error rates. This is likely due to the advancement in car technology, where the size of the engine has less to do with its efficiency.

Weight is negatively correlated with mpg. This makes sense. The heavier the car, the more petrol it will consume for the same mileage. Model year is positively correlated with the mpg. The newer the car, the more efficient it is.

So when buying a car with optimal fuel consumption, one should probably go with a newer model with a smaller engine.

Zadnja sprememba: četrtek, 17. marec 2022, 09.47