Description
This dataset is the result of a study on the quality of official datasets available for COVID-19. We used comparative statistical analysis to evaluate the accuracy of data collection by a national (Chinese Center for Disease Control and Prevention) and two international (World Health Organization; European Centre for Disease Prevention and Control) organisations based on the value of systematic measurement errors. The data is collected by using text mining techniques and reviewing reports, metadata, and reference data. The combined dataset includes complete spatial data such as countries area, standard country codes (M49 code), Alpha-2 codes, Alpha-3 codes, latitude, longitude, and some additional attributes such as population. The data of China is presented in more detail in another sheet, which is extracted from the attached reports to the main page of the CCDC website. Additionally, it is beneficiary of major corrections on the referenced data-sets and official reports such as adjustment of the date of reports (which was suffering from one or two days lags), removing four negative values, detecting unreasonable changes of historical data in new reports (which was revealed by comparing the daily reports), and finally the corrections on systematic measurement errors, (which was increased by the increase of the number of infected countries). An aggregated root mean square error was used to identify the main problematic parts of data-sets in addition to comparative statistical analysis to evaluate the errors. The result is a combined dataset with improved systematic measurement errors and with some new attributes in addition to the normal attributes of SARS-CoV-2 and cronavirus disease, such as daily mortality, and fatality rates. This data-set could be considered as a comprehensive and reliable source of COVID-19 data for further studies.
Date made available | 13 May 2020 |
---|---|
Publisher | Mendeley Data |
Date of data production | 5 Apr 2020 |