Abstract
Principal Component Analysis (PCA) is one of the most used multivariate techniques for dimension reduction assuming nowadays a particular relevance due to the increasingly common large datasets. Being mainly used as a descriptive/exploratory tool it does not require any explicit a priori assumption. However, regardless the parent population miss/unknown characterization, sample principal components are often used to characterize the parent population structure, as these are frequently targeted to visualize multivariate datasets on a 2D graphical display or to infer the first two latent dimensions. In this context, although the main goal might not be inferential, sample principal components may fail to provide a valid solution as principal components may vary considerably, depending on the extracted sample. The stability of the PCA solution is here studied considering normal and non-normal parent populations and three covariance structures scenarios. In addition, the effects of the covariance parameter, the dimension and the size of the sample are also investigated via Monte Carlo simulations. This study aims to understand how stability varies with the population and sample features, characterize the conditions under which PCA results are expected to be stable, and study a sample criterion for PCA stability.
Original language | English |
---|---|
Pages (from-to) | 1060-1076 |
Number of pages | 18 |
Journal | Journal of Statistical Computation and Simulation |
Volume | 93 |
Issue number | 7 |
Early online date | 7 Oct 2022 |
DOIs | |
Publication status | Published - 2023 |
Keywords
- Principal components
- eigenvectors
- nonnormality
- simulation
- stability