Mastering Machine Learning Algorithms
上QQ阅读APP看书,第一时间看更新

Manifold assumption

This is the less intuitive assumption, but it can be extremely useful to reduce the complexity of many problems. First of all, we can provide a non-rigorous definition of a manifold. An n-manifold is a topological space that is globally curved, but locally homeomorphic to an n-dimensional Euclidean space. In the following diagram, there's an example of a manifold: the surface of a sphere in 3:

2D manifold obtained from a spherical surface

The small patch around P (for ε → 0) can be mapped to a flat circular surface. Therefore, the properties of a manifold are locally based on the Euclidean geometry, while, globally, they need a proper mathematical extension which is beyond the scope of this book (further information can be found in Semi-supervised learning on Riemannian manifolds, Belkin M., Niyogi P., Machine Learning 56, 2004).

The manifold assumption states that p-dimensional samples (where p >> 1) approximately lie on a q-dimensional manifold with p << q. Without excessive mathematical rigor, we can say that, for example, if we have N 1000-dimensional bounded vectors, they are enclosed into a 1000-dimensional hypercube with edge-length equal to r. The corresponding n-volume is rp = r1000, therefore, the probability of filling the entire space is very small (and decreases with p). What we observe, instead, is a high density on a lower dimensional manifold. For example, if we look at the Earth from space, we might think that its inhabitants are uniformly distributed over the whole volume. We know that this is false and, in fact, we can create maps and atlases which are represented on two-dimensional manifolds. It doesn't make sense to use three-dimensional vectors to map the position of a human being. It's easier to use a projection and work with latitude and longitude.

This assumption authorizes us to apply dimensionality reduction methods in order to avoid the Curse of Dimensionality, theorized by Bellman (in Dynamic Programming and Markov Process, Ronald A. Howard, The MIT Press). In the scope of machine learning, the main consequence of such an effect is that when the dimensionality of the samples increases, in order to achieve a high accuracy, it's necessary to use more and more samples. Moreover, Hughes observed (the phenomenon has been named after him and it's presented in the paper Hughes G. F., On the mean accuracy of statistical pattern recognizers, IEEE Transactions on Information Theory, 1968, 14/1) that the accuracy of statistical classifiers is inversely proportional to the dimensionality of the samples. This means that whenever it's possible to work on lower dimensional manifolds (in particular in semi-supervised scenarios), two advantages are achieved:

  • Less computational time and memory consumption
  • Higher classification accuracy