Weekly outline

  • General

    Mining Massive Data Sets

    Orodja za analizo velikih podatkovnih baz

    Instructor: Jure Leskovec
    Instructor at FRI: Matej Guid

    Email: matej.guid@fri.uni-lj.si

    Schedule: This course starts in the second week of January. We will follow the CS246 schedule, which means that you will also have to do homework assignment during exam break. 

    In March, the course will be held in P04 on Tuesdays at 17:15.

    The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data. Topics include: MapReduce and Spark/Hadoop, Frequent itemsets and Association rules, Near Neighbor Search in High Dimensions, Locality Sensitive Hashing (LSH), Dimensionality reduction, Recommendation Systems, Clustering, Analysis of massive graphs, Link Analysis (PageRank, HITS), Web spam (TrustRank), Proximity search on graphs, Large scale supervised machine learning, Mining data streams, Learning through experimentation, Web Advertising and Optimizing submodular functions. This course is offered in collaboration with the Stanford University, which offers this course as CS246. Videos of lectures will be available for download. Our university will organize short weekly review sessions and consultations. 

    Pregledali bomo algoritme strojnega učenja in iskanja znanj v podatkih, ki zmorejo obdelati zelo velike količine podatkov. Med drugim bomo obravnavali naslednje teme: postopek "MapReduce" (preslikaj in skrči), pogosto ponavljoče se stvari v košaricah in povezovalna pravila, učinkovito iskanje sosedov v velikih podatkih, zgoščevanje s sosednostjo (LSH), zmanjševanje dimenzionalnosti, priporočilni sistemi, odkrivanje skupin v podatkih, analiza masivnih grafov,  analiza povezav (PageRank, HITS), nezaželene spletne vsebine (TrustRank), iskanje bližnjih vozlišč v grafih, nadzorovano strojno učenje na velikih podatkih, učenje iz podatkovnih tokov, učenje z eksperimentiranjem, spletno oglaševanje in optimiranje submodularnih funkcij.  Predmet bo izvajal predavatelj iz Stanforda, kjer se ta predmet izvaja kot CS246. Predavanj ne boste spremljali v živo, pač pa prek video posnetkov. Na FRI bomo organizirali kratke preglede odpredavanega in konzultacijske vaje.

    Zoom meeting instructions

    In the case of online sessions we will use the following Zoom link:


    Meeting ID: 975 2555 4907
    Passcode: 067962


    Course website / Spletna stran predmeta: http://web.stanford.edu/class/cs246/

    Important info / Pomembne informacije:

    Classes / Predavanja

    1. Introduction; MapReduce and Spark (Tue March 30)
    2. Frequent Itemsets Mining (Thu April 1)

    Additional materials / Dodatna gradiva: https://web.stanford.edu/class/cs246/index.html#schedule

    Reference text / Knjiga: http://www.mmds.org/

    Weekly Colab notebooks: 

    • you will find them directly on the http://web.stanford.edu/class/cs246/ website,
    • they are posted every Thursday,
    • due one week later on Thursday 23:59 Pacific Time (PT), but rather submit earlier!
    • submit via this website (below).

    Assignments and grading:

    • 4 homework assignments requiring coding and theory (40%)
    • Final exam (30%)
    • Weekly Colab notebooks (30%)

    More about the course is on the CS246 Stanford web page. All deadlines on FRI are exactly the same as Stanford deadlines.

  • Prosojnice in dodatna gradiva / Slides and supplementary materials

  • Colab notebooks

    Submit Colab notebooks here; every week no later than Friday 9 am.

    Your submission should contain a ZIP file:

    • Jupyter notebook in HTML format (download the jupyter notebook file and then use the command "jupyter nbconvert <file_name.ipynb>" in the command prompt).
    • A text/PDF file with answers to the questions (the submission page will always contain a document with questions). 

    Each file should use the following naming convention: colab<number>_<name>_<surname>.

  • Homework Submissions

    Your submission (every second Friday 9:00 CET) should be a ZIP file containing three files:

    • file <name>_<surname>.pdf: written report.
    • file <name>_<surname>.zip: all the requested code. Use subfolders ("q1", "q2", ...) for partiqular questions. Include at least .ipynb and .html files, .py files are welcome too.
    • Cover sheet (make sure you state your collaborators and the date of submission).

    Note that you submit the homeworks only here on Ucilnica. You do not submit anything to GradeScope or SNAP website!

    Late days: you are allowed to use the “late days” twice with your homework (but only once per particular homework!). Do not submit your homework later than Tuesday 9:00 CET, the first Tuesday after the regular deadline.

  • Exam

    The exam will be held on Tuesday, March 15, at 17:00 in P04. It will start at 17:15, right after you receive the printed exams.

    More details:

    • The exam is worth 30% of your course grade.
    • It covers the material from the lectures up to week 9 (including Computational Advertising). That is, it will not contain questions from the last two topics.
    • It is a 3-hour exam, meaning it will last 180 minutes.
    • The exam is open book; you may use any reference material. However, no collaboration is allowed.
    • Laptops are allowed, but any communication is strictly prohibited.