PCA (principal component analysis)

Principal Component Analysis (PCA) is a technique that reduces the dimensionality of data while retaining the variation present in the data.

Principal Component Analysis (PCA) is a technique that reduces the dimensionality of data while retaining the variation present in the data. In essence, a new coordinate system is constructed so that data variation is strongest along the first axis, less strong along the second axis, and so on. Then, the data points are transformed into this new coordinate system. The directions of the axes are called principal components.

If the input data is a table with p columns, there could be maximum p principal components. However, it's usually the case that the data variation along the direction of some k-th principal component becomes almost negligible, which allows us to keep only the first k components. As a result, the new coordinate system has fewer axes. Hence, the transformed data table has only k columns instead of p. It is important to remember that the k output columns are not simply a subset of p input columns. Instead, each of the k output columns is a combination of all p input columns.

You can use the following functions to train and apply the PCA model:

For a complete example, see Dimension reduction using PCA.