Exploratory Analysis and ClusteringAttribute-based data sets. Preparing and loading the data into Orange Data Mining software. Data analysis workflows. Scatterplot and box plot. Hierarchical clustering: distances between data items, distances between clusters, agglomerative approach to data clustering. Cluster explanation.
Video lectures: Orange workflows, data exploration, workflow management, your own data, clustering-theory, clustering in 2d, clustering of multi-dimensional data, and clustering of zoo data set.
Regression Models and Regularization
Linear regression. The shape of the model. Optimization function. Polynomial expansion. Overfitting. Regularization. Accuracy on training and test set. Evaluating the accuracy of regression models. Feature scoring and selection.
Classification ModelsPrediction models and how they differ from clusterings. Classification trees as an example of an intuitive, early prediction model. Naive Bayesian model as efficient, yet limited model. Linear models, e.g. logistic regression.
Model EvaluationPerformance scores: classification accuracy, sensitivity and specificity, precision and recall, ... Performance curves and related score(s). Cross validation.
Data Projection and Embedding. Image Analytics.