Introduction to network analysis. Graphs. Networks.

Node position. Spectral and distance node centrality. Clustering coefficients. Link analysis algorithms.

Link importance. Betweenness and bridgeness link centrality. Embeddedness and topological overlap.

Node similarity. Local and global node similarity. Structural and regular equivalence. Block models.

Node fragments. Egonets analysis. Network motifs and graphlets. Convex subgraphs. Node orbit distributions.

Graph partitioning. Graph bisection. Spectral analysis. Hierarchical clustering. Core-periphery structure.

Network clustering. Modularity optimization. Community detection. Role discovery. Blockmodeling.

Network modeling. Erdos-Renyi. Watts-Strogatz. Price, Barabasi-Albert and configuration models.

Network abstraction. Structural network comparison. Network layout algorithms. Network visualization.

Network mining. Node classification and ranking by equivalence and position. Link prediction by similarity.

Selected applications of network analysis. Fraud detection. Software engineering. Information science.

How similar are living organisms? Have human indeed descended from Neanderthals? How did various species adapt to living environments? Which genes are responsible for susceptibility to common disseases? Why do we need a different flu vaccine each year?

The answers to these and similar questions can be found through the studies of biomedical data. The discipline that does this is called bioinformatics. Bioinformatitians develop tools to find interesting data patterns and supoort understanding of biological processes. They analyze and compare protein sequences, search for genes, assemble genomes, compare species based on their genetic material, forecast protein structures and find active parts of proteins, analyze gene expression, and inverse engineer genetic networks. They can relate the genome to its phenotype and can trace evolution back to its roots to find adaptations and changes in the organisms.

The course is intended for students in computer science. No prior knowledge in molecular biology is required. We will introduce essential concepts from molecular biology and genetics as these are required to understand associated computational tools. We will learn about sequence alignment, hidden Markov models, clustering, phylogenetic analysis, statistical testing and some machine learning. The course will be practical, and will include up to eight homeworks on analysis of real mlecular biology data. Students are expected to have background in essential probability, statistics, and programming. We will use Python as a programming language.

Machine learning is used in industry, medicine, economy etc. for data analysis and knowledge discovery from databases, data mining, for generating knowledge bases for expert systems, for learning predictions and recognitions, playing games, understanding natural language, hand-writing, speech, images etc. A basic principle of machine learning is decribing (modelling) of phenomena from data. The result of learning are rules, functions, relations, equations, and probabilistic distributions. The trained models try to explain data and can be used for decision making during the observation of the modelled process in future. The goal of the course is to present the teoretical basics and basic priciples of machine learning methods, basic machine learning algorithms and their usage in practice for knowledge discovery from data and for learning classification and regression models. Students will apply the theoretical knowledge on real world problems from science and economy.



Overview of course contents: 
What
is learning and relation between learning and intelligence, ML basics, Advanced
attribute evaluation measures, Advanced methods for estimating performance of
ML, Advanced visualization methods, Combining ML algorithms, Bayesian learning,
Calibration of probabilities, Explanation of individual predictions, Numerical
ML methods, Artificial neural networks: RBF, Deep NN, Unsupervised learning:
clustering, Association rules, Estimating the reliability of individual
predictions, Text mining, Matrix factorization, Arcehtypal analysis, ML as data
compression, active learning, user porfiling and recommendation systems, ILP, Introduction
to learning theory.

Practical part is in the form of solving problems and web quizzes and completing the seminar work. Assistant is available for consultations. The grade of practical work is the grade of the seminar work. The precondition for passing practical work is achieving at least 50% of points in web quizzes.

The final course grade consists of practical work grade (50%) and exam (50%). On the written exam students need to achieve at least 50% of points.