IP (A) (Orodja za analizo velikih podatkovnih baz)
Section outline
-
The lectures @FRI will take place on Fridays at 14:15 in P03.
Orodja za analizo velikih podatkovnih baz
Obravnavali bomo algoritme podatkovnega rudarjenja in strojnega učenja za analizo zelo velikih količin podatkov. Predmet se izvaja vzporedno s predmetom Mining massive data sets na Stanford University (prof. Jure Leskovec).
Mining massive data sets
The course will discuss data mining and machine learning algorithms for analyzing very large amounts of data.
Topics include:
MapReduce and Spark; Frequent Itemsets and Association Rules; Locality-Sensitive Hashing; Clustering; Dimensionality reduction; Recommender Systems; Link Analysis: PageRank & Extensions); Community Detection in Graphs; Learning Embeddings; Graph Representation Learning; Graph Neural Networks; Large-Scale Supervised Machine Learning; Mining Data Streams; Computational Advertising; Optimizing Submodular Functions; Multi-Armed Bandits
Assignments and grading:
- 4 homework assignments requiring coding and theory (40%)
- Final exam (30%)
- Weekly Colab notebooks (30%)
Useful links:
- Course website: http://web.stanford.edu/class/cs246/
- Handouts (PDF): http://web.stanford.edu/class/cs246/handouts/CS246_Info_Handout.pdf
- Reference book: http://www.mmds.org/
All deadlines at FRI are exactly the same as Stanford deadlines.
-
Video lectures from the current course:
-
-
43.8 KB
-
This document provides a clear, step-by-step explanation of the PCY algorithm, demonstrating its ability to prune candidate pairs by using a compact dataset and a simple hash function.
-
Potrebe po izboljšanju odkrivanja skupin vedno bolj zahtevajo možnost interakcije z domenskimi eksperti, kar je vodilo do razvoja algoritmov odkrivanja skupin z omejitvami (angl. constrained clustering). Ti algoritmi uporabljajo domensko znanje v obliki pozitivnih (angl. must-link) in negativnih omejitev (angl. cannot-link) na pare učnih primerov, kar omogoča izboljšanje procesa odkrivanja skupin...
-
SVD demo Datoteka XLSX
-
A template for the hubs-and-authorities algorithm - HITS (hyperlink-induced topic search).
-
V magistrskem delu, ki je rezultiralo v objavi spodnjega članka v ugledni znanstveni reviji Mathematics, uporabimo moderne pristope strojnega učenja na grafih za pohitritev dinamičnega algoritma za iskanje maksimalne klike.
Kristjan Reba, Matej Guid, Kati Rozman, Dušanka Janežič, and Janez Konc. Exact maximum clique algorithm for different graph types using machine learning. Mathematics 10, no. 1 (2022): 97.
-
avtor William L. Hamilton, McGill University
-
Videoposnetki pri predmetu CS224W: Machine Learning with Graphs na Stanford University, ki ga vodi in poučuje prof. Jure Leskovec.
-
Submit Colab notebooks here; every week no later than Friday 9 am.
Your submission should contain a ZIP file:
- Jupyter notebook in HTML format (download the jupyter notebook file and then use the command "jupyter nbconvert --to html <file_name.ipynb>" in the command prompt).
- A text file with answers to the questions (the submission page will always contain a document with questions).
Each file should use the following naming convention:
colab<number>_<name>_<surname>.html
colab<number>_<name>_<surname>.txt - Jupyter notebook in HTML format (download the jupyter notebook file and then use the command "jupyter nbconvert --to html <file_name.ipynb>" in the command prompt).
-
Your submission (every second Friday 9:00 CET) should be a ZIP file containing three files:
- file <name>_<surname>.pdf: written report.
- file <name>_<surname>.zip: all the requested code. Use subfolders ("q1", "q2", ...) for partiqular questions. Include at least .ipynb and .html files, .py files are welcome too.
- Cover sheet (make sure you state your collaborators and the date of submission).
Late days: you are allowed to use the “late days” twice with your homework (but only once per particular homework!). Do not submit your homework later than Tuesday 9:00 CET, the first Tuesday after the regular deadline.
-
The exam will start right after you receive the printed exams.
- The exam is worth 30% of your course grade.
- It is a 3-hour exam, meaning it will last 180 minutes.
Announcements for the 2025 exam:
- no coding problems
- 2 "cheat sheets" are allowed (you can use the front and back)
- no calculators or other materials except pencil and cheat sheets
- scope: lecture, lecture slides, colabs, homeworks
- emphasis on lectures and lecture slides
-
368.4 KB
-
213.0 KB
-
405.5 KB
-
See the "Final Exam Review Session" lecture in the Winter Course 2022.
8.0 MB -
1.4 MB
-
Naloženo 7/03/22, 22:45
-
Naloženo 7/03/22, 22:46