Institutional Repository [SANDBOX]
Technical University of Crete
EN  |  EL

Search

Browse

My Space

Differentially private data synthesis using Variational Autoencoders

Margaritis Georgios

Full record


URI: http://purl.tuc.gr/dl/dias/9031C912-DCF1-408B-983B-EF725CEFBB34
Year 2021
Type of Item Diploma Work
License
Details
Bibliographic Citation Georgios Margaritis, "Differentially private data synthesis using Variational Autoencoders", Diploma Work, School of Electrical and Computer Engineering, Technical University of Crete, Chania, Greece, 2021 https://doi.org/10.26233/heallink.tuc.89575
Appears in Collections

Summary

Following major privacy breaches around the world, individuals and organizations are becoming increasingly reluctant in giving away their personal data. This heightened awareness for privacy is hindering the creation of rich, centralized datasets, and results in data owners keeping their datasets private. However, if different parties are unwilling to share their data with one another, then the models they will be able to build on their own will be of inferior quality, due to the lack of data. Hence, in this thesis, we attempt to combine Variational Autoencoders, Federated Learning and Differential Privacy to solve this problem. These tools can enable a group of individuals or organizations to collaboratively create a rich synthetic dataset, without revealing their private data to one another, and without compromising their privacy. Then, they can all use the synthetic dataset to supplement their private datasets, they can use it to perform hyperparameter tuning on their models, or they can even release it publicly and share it with any other party. In any case, they will be mathematically assured that their privacy won’t be adversely affected, no matter what they choose to do with the synthetic dataset, or who they choose to share it with. Those privacy guarantees, which stem from the mathematical properties of Differential Privacy, are crucial when dealing with owners of sensitive data such as hospitals and healthcare organizations. In such cases, the volume of data a single hospital has may be rather limited, potentially leading to very poor diagnostic models. Hence, a privacy-aware synthetic dataset created by multiple hospitals, could pave the way for much better diagnostic models, while preserving the privacy of hospitals and their patients.

Available Files

Services

Statistics