Section outline

  • Hierarchical clustering shows us hierarchical grouping, yet it does not tell us that much about the structure of the data. Some points from one cluster may be close to points from another, yet they fall into a different part of the hierarchy.

    Multidimensional scaling (MDS) is technique that puts data instance onto a two dimensional plane (or, in principle, into a higher number of dimensions), so that the distances between points correspond to the given distances between objects as closely as possible. The optimization works by starting from a random (or a not-so-random, but precomputed by another method) position, and the iteratively move the points towards positions that more closely match the prescribed distances. It may get stuck in local optima, so we can restart the procedure. The resulting visualization allows us to manually observe any structure in the data, like groups, and relations between points. MDS is a method of (multivariate) data projections, but it is non-parametric: the position of the point does not depend on values of some variables, but on its relation with other points.

    Principal component analysis is a method that finds a linear projection of data into a lower dimensional space. Just as three-dimensional data may actually lie on a plane (or within a relative small distance above or below it), 100-dimensional data may actually lie on, say, 13-dimensional hyperplane (with small distances from it). You will learn more about PCA in another class, so we presented it just intuitively and showed its application in Orange.

    PCA is unsupervised: it does not take the target value (if there is any) into account. In the sense of the scatter plot: it is color blind. It may be the case that some 3d data lies more or less on the plane, so PCA would select these two dimensions and discard the third as explaining too little variance. The problem may be that we actually have a classification problem and the class depends exactly on whether the point is (slightly) below or above the plane. This may be solved by the partial least squares.

    A funny little method for this is FreeViz, invented by yours truly. It performs a similar optimization as MDS, but the projection is linear, so a point can only move if the corresponding axes are moved. See the paper or, better still, play with it in Orange. (The whole thing is rather naive, though, and partial least squares probably does the same job.)

    Self-organizing map is another projection method. It's unsupervised, like PCA and MDS (and unlike Freeviz), it's based on coordinates (variables, space, vectors), like PCA and FreeViz (and unlike MDS), and it's not linear like MDS (and unlike PCA and FreeViz)... As the name says: it's a map. It splits the region into rectangular or hexagonal cells and assigns a "profile" (a vector of the same dimensionality as data instances) to each cell. Then every data instance is assigned to the cell whose profile matches it best. Next, profiles of cells are updated to match the instances that contain; each data instance does not affect only its cell, but also its neighbouring cells, with some weight. Then it repeats both steps - (re)assigning instances and then recomputing weights - until convergence. The effect is that data instances are spread across the map, usually forming clusters with similar properties.