Weekly outline

  • General


    Mining Massive Data Sets

    Orodja za analizo velikih podatkovnih baz

    Instructor: Jure Leskovec
    Instructor at FRI: Matej Guid

    Email: matej.guid@fri.uni-lj.si

    Schedule: This course starts in the second week of January. We will follow the CS246 schedule, which means that you will also have to do homework assignment during exam break. 

    Izvajanje predmeta bo potekalo online preko platforme Zoom ob ponedeljkih ob 16:15.

    The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data. Topics include: MapReduce and Spark/Hadoop, Frequent itemsets and Association rules, Near Neighbor Search in High Dimensions, Locality Sensitive Hashing (LSH), Dimensionality reduction, Recommendation Systems, Clustering, Analysis of massive graphs, Link Analysis (PageRank, HITS), Web spam (TrustRank), Proximity search on graphs, Large scale supervised machine learning, Mining data streams, Learning through experimentation, Web Advertising and Optimizing submodular functions. This course is offered in collaboration with the Stanford University, which offers this course as CS246. Videos of lectures will be available for download. Our university will organize short weekly review sessions and consultations. 

    Pregledali bomo algoritme strojnega učenja in iskanja znanj v podatkih, ki zmorejo obdelati zelo velike količine podatkov. Med drugim bomo obravnavali naslednje teme: postopek "MapReduce" (preslikaj in skrči), pogosto ponavljoče se stvari v košaricah in povezovalna pravila, učinkovito iskanje sosedov v velikih podatkih, zgoščevanje s sosednostjo (LSH), zmanjševanje dimenzionalnosti, priporočilni sistemi, odkrivanje skupin v podatkih, analiza masivnih grafov,  analiza povezav (PageRank, HITS), nezaželene spletne vsebine (TrustRank), iskanje bližnjih vozlišč v grafih, nadzorovano strojno učenje na velikih podatkih, učenje iz podatkovnih tokov, učenje z eksperimentiranjem, spletno oglaševanje in optimiranje submodularnih funkcij.  Predmet bo izvajal predavatelj iz Stanforda, kjer se ta predmet izvaja kot CS246. Predavanj ne boste spremljali v živo, pač pa prek video posnetkov. Na FRI bomo organizirali kratke preglede odpredavanega in konzultacijske vaje.


    USEFUL LINKS / KORISTNE POVEZAVE

    Course website / Spletna stran predmeta: http://web.stanford.edu/class/cs246/

    Important info / Pomembne informacije:


    Classes / Predavanja

    • 2021 - Posnetki nadaljnjih predavanj bodo na voljo na povezavi Lecture Videos.
    1. Introduction; MapReduce and Spark (Tue March 30)
    2. Frequent Itemsets Mining (Thu April 1)


    Additional materials / Dodatna gradiva: https://web.stanford.edu/class/cs246/index.html#schedule

    Reference text / Knjiga: http://www.mmds.org/


    Weekly Colab notebooks: 

    • you will find them directly on the http://web.stanford.edu/class/cs246/ website,
    • they are posted every Thursday,
    • due one week later on Thursday 23:59 Pacific Time (PT), but rather submit earlier!
    • submit via this website (below).


    Assignments and grading:

    • 4 homework assignments requiring coding and theory (60%)
    • Weekly Colab notebooks (40%)


    More about the course is on the CS246 Stanford web page. All deadlines on FRI are exactly the same as Stanford deadlines.


  • Prosojnice in dodatna gradiva

  • Colab notebooks

    Submit Colab notebooks here; every week no later than Friday 9 am.

    Your submission should Jupyter notebook in HTML format.  (Hint: use command "jupyter nbconvert <file_name.ipynb>" 

     

  • Homework Submissions

    Your submission should consists of two files:

    • file <name>_<surname>.pdf: written report. Please, use Cover sheet as the first page.
    • file <name>_<surname>.zip: all the requested code. Use subfolders ("q1", "q2", ...) for partiqular questions. Include at least .ipynb and .html files, .py files are welcome too.

    Note that you submit the homeworks only here on ucilnica. You do not submit anything to GradeScope or SNAP website!

    Late days: you are allowed to use the “late days” twice with your homework (but only once per particular homework!). Do not submit your homework later than Tuesday 9:00 CET, the first Tuesday after the regular deadline.