This is a practical course on data mining. Typically, we would consider picking up a data mining challenge at Kaggle, and then during a course learn about challenge-related methods through theory and practical implementations. Implementations would most often use Python and related data analytics libraries that include numpy, scikit-learn, Orange and libraries for data visualization such as matplotlib and NetworkX.
The course involves homeworks, which are due every two weeks and where you are expected to complete a mini-project and report about the implementation and results. In the first half of the course, mini projects help us to get familiar with the data mining methods and techniques, and in the second part help us to get started with a chosen challenge and then dig into it to improve on challenge results.
Homeworks are at the first half of the course individual, and towards the end of the course carried out in smaller groups of two to three students. The homeworks replace written exam and the course grade entirely depends on their quality. There's no oral exam.
Typical data mining techniques that we got familiar to during the previous installments of this course include assessment of similarity, clustering, methods for classification (logistic regression, neural nets, random forests), feature subset selection techniques, and techniques for recommendation systems.
- nosilec: Blaž Zupan