Orodja za analizo velikih podatkovnih baz

Pregledali bomo algoritme strojnega učenja in iskanja znanja v podatkih, ki zmorejo obdelati zelo velike količine podatkov. Poudarek bo na postopku "Map-Reduce" za ustvarjanje vzporednih algoritmov.

Med drugim bomo obravnavali: pogosto ponavlajoče se stvari v košaricah in povezovalna pravila, učinkovito iskanje sosedov v velikih podatkih, zgoščevanje s sosednostjo (LSH), zmanjševanje dimenzionalnosti, priporočilne sisteme, odkrivanje skupin v podatkih, analiza povezav (PageRank), nadzorovano strojno učenje na velikih podatkih, učenje s podatkovnih tokov, iskanje znanja v strukturiranih virih na spletu in spletno oglašanje.

Termin  izvajanja: 7. 1. 2019 do 15. 3. 2019.


Schedule: This course starts in the beginning of
January. We will follow a weekly schedule which means that you will also
have to do homework assignment during exam break.

The
course will discuss data mining and machine learning algorithms for
analyzing very large amounts of data. The emphasis will be on Map Reduce
as a tool for creating parallel algorithms that can process very large
amounts of data.

Topics include: Frequent itemsets and
Association rules, Near Neighbor Search in High Dimensional Data,
Locality Sensitive Hashing (LSH), Dimensionality reduction,
Recommendation Systems, Clustering, Link Analysis (PageRank), Large
scale supervised machine learning, Data streams, Mining the Web for
Structured Data, Relation extraction and Web Advertising.

Term of implementation: 7. 1. 2019 to 15. 3. 2019.