Introduction: Simplified Acute Physiology Score 3 (SAPS 3) was the first critical care prognostic model developed from worldwide data. We aimed to systematically review studies that assessed the prognostic performance of SAPS 3 general and customized models for predicting hospital mortality in adult patients admitted to the ICU. Methods: Medline, Lilacs, Scielo and Google Scholar were searched to identify studies which assessed calibration and discrimination of general and customized SAPS 3 equations. Additionally, we decided to evaluate the correlation between trial size (number of included patients) and the Hosmer-Lemeshow (H-L) statistics value of the SAPS 3 models. Results: A total of 28 studies were included. Of these, 11 studies (42.8\%) did not find statistically significant mis-calibration for the SAPS 3 general equation. There was a positive correlation between number of included patients and higher H-L statistics, that is, a statistically significant mis-calibration of the model (r = 0.747, P < 0.001). Customized equations for major geographic regions did not have statistically significant departures from perfect calibration in 9 of 19 studies. Five studies (17.9\%) developed a regional customization and in all of them this new model was not statistically different from a perfect calibration for their populations. Discrimination was at least very good in 24 studies (85.7\%). Conclusions: Statistically significant departure from perfect calibration for the SAPS 3 general equation was common in validation studies and was correlated with larger studies, as should be expected, since H-L statistics (both C and H) are strongly dependent on sample size This finding was also present when major geographic customized equations were evaluated. Local customizations, on the other hand, improved SAPS 3 calibration. Discrimination was almost always very good or excellent, which gives excellent perspectives for local customization when a precise local estimate is needed.