Inference for Multivariate Regression Model Based on Synthetic Data Generated Using Plug-in Sampling

Ricardo Moura, Martin Klein, John Zylstra, Carlos A. Coelho, Bimal Sinha

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

In this article, the authors derive the likelihood-based exact inference for singly and multiply imputed synthetic data in the context of a multivariate regression model. The synthetic data are generated via the Plug-in Sampling method, where the unknown parameters in the model are set equal to the observed values of their point estimators based on the original data, and synthetic data are drawn from this estimated version of the model. Simulation studies are carried out in order to confirm the theoretical results. The authors provide exact test procedures, which in case multiple synthetic datasets are permissible, are compared with the asymptotic results of Reiter. An application using 2000 U.S. Current Population Survey public use data is discussed. Furthermore, properties of the proposed methodology are evaluated in scenarios where some of the conditions that were used to derive the methodology do not hold, namely for nonnormal and discrete distributed random variables, cases in which the inferential procedures developed still show very good performances.

Original languageEnglish
Pages (from-to)720-733
Number of pages14
JournalJournal of the American Statistical Association
Volume116
Issue number534
DOIs
Publication statusPublished - 2021

Keywords

  • Data confidentiality
  • Finite sample analysis
  • Maximum likelihood estimators
  • Multivariate regression
  • Partially synthetic data
  • Pivotal quantities
  • Plug-in sampling
  • Statistical disclosure control

Fingerprint

Dive into the research topics of 'Inference for Multivariate Regression Model Based on Synthetic Data Generated Using Plug-in Sampling'. Together they form a unique fingerprint.

Cite this