Synthetic data generation for primary care data

Exploring how synthetic data can be used to share valuable primary care information for AI modeling, incorporating geolocation and temporal drift.

The project will explore our synthetic data generator, constructed in collaboration with the Medicines and Healthcare Products Regulatory Agency to enable sharing of valuable primary care information without risking patient privacy. It will explore to what extent temporal information (how primary care data changes over time) and spatial information (the impact of regional differences) can be incorporated into synthetic primary care data generation.

Geolocation data will be incorporated into Bayesian Networks (BNs) for modeling primary care data in the UK. Latent variables will be explored in these models using inference and visualisation techniques to gain an understanding of the importance and semantics of these latent variables. Temporal drift will be measured in models of primary care data using concept drift metrics.

Publications

de Benedetti, J., Oues, N., Wang, Z., Myles, P., Tucker, A. (2020). Practical lessons from Generating Synthetic Healthcare Data with Bayesian Networks. In: Koprinska I. et al. (eds) ECML PKDD 2020 Workshops. ECML PKDD 2020. Communications in Computer and Information Science, vol 1323. Springer, Cham.

Tucker, A., Wang, Z., Rotalinti, Y. et al. Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. npj Digit. Med. 3, 147 (2020).

Wang, Z, Myles, P, Tucker, A. Generating and evaluating cross-sectional synthetic electronic healthcare data: Preserving data utility and patient privacy. Computational Intelligence. 2021; 1– 33.

Meet the Principal Investigator(s) for the project

Dr Allan Tucker - Allan Tucker is Reader in the Department of Computer Science where he heads the Intelligent Data Analysis Group consisting of 17 academic staff, 15 PhD students and 4 post-docs. He has been researching Artificial Intelligence and Data Analytics for 21 years and has published 120 peer-reviewed journal and conference papers on data modelling and analysis. His research work includes long-term projects with Moorfields Eye Hospital where he has been developing pseudo-time models of eye disease (EPSRC - £320k) and with DEFRA on modelling fish population dynamics using state space and Bayesian techniques (NERC - £80k). Currently, he has projects with Google, the University of Pavia Italy, the Royal Free Hospital, UCL, Zoological Society of London and the Royal Botanical Gardens at Kew. He was academic lead on an Innovate UK, Regulators’ Pioneer Fund (£740k) with the Medical and Health Regulatory Authority on benchmarking AI apps for the NHS, and another on detecting significant changes in Adaptive AI Models of Healthcare (£195k). He is currently academic lead on two Pioneer Funds on Explainability of AI (£168k) and In-Silico Trials (£750k). He serves regularly on the PC of the top AI conferences (including IJCAI, AAAI, and ECML) and is on the editorial board for the Journal of Biomedical Informatics. He hosted a special track on "Explainable AI" at the IEEE conference on Computer Based Medical Systems in 2019 and was general chair for AI in Medicine 2021. He has been widely consulted on the ethical and practical implications of AI in health and medical research by the NHS, and the use of machine learning for modelling fisheries data by numerous government thinktanks and academia.

Related Research Group(s)

Intelligent Data Analysis - Concerned with effective analysis of data involving artificial intelligence, dynamic systems, image and signal processing, optimisation, pattern recognition, statistics and visualisation.

Partnering with confidence

Organisations interested in our research can partner with us with confidence backed by an external and independent benchmark: The Knowledge Exchange Framework. Read more.

Project last modified 21/11/2023

Synthetic data generation for primary care data

Exploring how synthetic data can be used to share valuable primary care information for AI modeling, incorporating geolocation and temporal drift.

Publications

Read more

New synthetic datasets to assist COVID-19 and cardiovascular research

Synthetic data

The People in This Medical Research Are Fake. The Innovations Are Real

Meet the Principal Investigator(s) for the project

Related Research Group(s)

Partnering with confidence