上QQ阅读APP看书,第一时间看更新
Dimensionality reduction
Dimensionality reduction is used to reduce the dimensionality of a dataset. It is really helpful in cases where the problem becomes intractable, when the number of variables increases. By using the term dimensionality, we are referring to the features. One of the basic reduction techniques is feature engineering.
Generally, we have many dimensionality reduction algorithms:
- Low variance filter: Dropping variables that have low variance, compared to others.
- High correlation filter: This identifies the variables with high correlation, by using pearson or polychoric, and selects one of them using the Variance Inflation Factor (VIF).
- Backward feature elimination: This is done by computing the sum of square of error (SSE) after eliminating each variable n times.
- Linear Discriminant Analysis (LDA): This reduces the number of dimensions, n, from the original to the number of classes — 1 number of features.
- Principal Component Analysis (PCA): This is a statistical procedure that transforms variables into a new set of variables (principle components).