Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Generate Synthetic Data in R for a Hypothetical Alzheimer’s Disease Trial

View ORCID ProfileRon Handels, View ORCID ProfileLinus Jönsson, View ORCID ProfileLars Lau Raket, the Alzheimer’s Disease Neuroimaging Initiative
doi: https://doi.org/10.1101/2024.02.05.24302140
Ron Handels
1Maastricht University; Alzheimer Centre Limburg; Faculty of Health, Medicine and Life Sciences; School for Mental Health and Neuroscience; Department of Psychiatry and Neuropsychology; 6200 MD, Maastricht, The Netherlands
2Division of Neurogeriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, BioClinicum J9:20, Akademiska stråket 171 64 Solna, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ron Handels
  • For correspondence: ron.handels{at}maastrichtuniversity.nl
Linus Jönsson
2Division of Neurogeriatrics, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, BioClinicum J9:20, Akademiska stråket 171 64 Solna, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Linus Jönsson
Lars Lau Raket
3Clinical Memory Research Unit, Department of Clinical Sciences, Lund University, Lund, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lars Lau Raket
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

INTRODUCTION Representative data of recent Alzheimer’s Disease (AD) trials are difficult to obtain. We aimed to generate a synthetic version of an original real-world observational dataset, subsequently apply a plausible AD treatment effect, and make our method open-source available.

METHODS Synthetic data was generated in the following steps: (1) Obtain real-world data from the ADNI study on demographic (age, sex, education), clinical (cognition: MMSE and ADAS; function: FAQ; composite cognition/function: CDR, ADCOMS) and biological (genetics: APOE4; cerebrospinal fluid: ABeta, Tau; imaging: PET-SUVR-centiloid) outcomes at baseline, 6, 12 and/or 18-month follow-up (35 variables), with missing data multiple-imputed to obtain 10 sets of 537 individuals. (2) Estimate (theoretical) minimum and maximum (all continuous variables) and proportions (all categorical variables). (3) Rescale to 0-1 range (continuous). (4) Estimate beta distribution shape parameters (method of moments; continuous). (5) Transform to cumulative probability distribution function (using shape parameters; continuous) and to cumulative probability (categorical). (6) Transform to a normal distribution. (7) Estimate variance-covariance matrix. (8) Generate random correlated normal data using Cholesky decomposition of variance-covariance. (9) Transform to cumulative probability distribution function. (10) Transform to beta distribution (using shape parameters; continuous). (11) Rescale to original range. (12) Keep half as control arm, and half as intervention arm, and estimate change from baseline. (13) Multiply intervention change from baseline with self-defined hypothetical relative treatment effect. We assumed correlations on normalized scale were similar to correlations on original scale. R code is available on github: https://github.com/ronhandels/synthetic-correlated-data.

RESULTS The synthetic distribution and mean over time showed large similarity to the original data (visually assessed). The absolute difference in pairwise correlations between original and synthetic data median was 0.02 (95th percentile=0.11, max=0.18).

CONCLUSION We judged our method sufficiently valid to generate synthetic correlated plausible hypothetical trial results.

Competing Interest Statement

LLR is an employee of Eli Lilly and Company. RH received outside this study research grants from JPND, ZonMW, IMI, H2020 (paid to institution); received outside this study consulting fees in the past 3 years from Lilly Nederland (2023), iMTA (2023), and Biogen (2021) (paid to institution); is member of IPECAD and member of ISPOR special interest group open-source models (un-paid).

Clinical Protocols

https://github.com/ronhandels/synthetic-correlated-data

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

We requested the data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) for the following specific aim: "describe the natural progression over a short-term period by mimicing/emulating data typically obtained from AD drug treatment randomized trials" and method as described in our manuscript (in short: select data from the ADNI, fit variance-covariance matrix, use variance-covariance matrix to generate synthetic data (mimic/emulate the data)). We have received the following reply from ADNI: "Your request for access to the Alzheimer's Disease Neuroimaging Initiative (ADNI) Data has been approved."

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • ↵* Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf

Data Availability

Part of the data are available online at https://github.com/ronhandels/synthetic-correlated-data. All data produced in the present study are available upon reasonable request to the authors.

https://github.com/ronhandels/synthetic-correlated-data

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted February 06, 2024.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Generate Synthetic Data in R for a Hypothetical Alzheimer’s Disease Trial
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Generate Synthetic Data in R for a Hypothetical Alzheimer’s Disease Trial
Ron Handels, Linus Jönsson, Lars Lau Raket, the Alzheimer’s Disease Neuroimaging Initiative
medRxiv 2024.02.05.24302140; doi: https://doi.org/10.1101/2024.02.05.24302140
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Generate Synthetic Data in R for a Hypothetical Alzheimer’s Disease Trial
Ron Handels, Linus Jönsson, Lars Lau Raket, the Alzheimer’s Disease Neuroimaging Initiative
medRxiv 2024.02.05.24302140; doi: https://doi.org/10.1101/2024.02.05.24302140

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (427)
  • Allergy and Immunology (753)
  • Anesthesia (220)
  • Cardiovascular Medicine (3281)
  • Dentistry and Oral Medicine (362)
  • Dermatology (274)
  • Emergency Medicine (478)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1164)
  • Epidemiology (13336)
  • Forensic Medicine (19)
  • Gastroenterology (896)
  • Genetic and Genomic Medicine (5127)
  • Geriatric Medicine (479)
  • Health Economics (780)
  • Health Informatics (3250)
  • Health Policy (1137)
  • Health Systems and Quality Improvement (1189)
  • Hematology (427)
  • HIV/AIDS (1012)
  • Infectious Diseases (except HIV/AIDS) (14611)
  • Intensive Care and Critical Care Medicine (908)
  • Medical Education (475)
  • Medical Ethics (126)
  • Nephrology (521)
  • Neurology (4898)
  • Nursing (261)
  • Nutrition (725)
  • Obstetrics and Gynecology (879)
  • Occupational and Environmental Health (795)
  • Oncology (2515)
  • Ophthalmology (722)
  • Orthopedics (280)
  • Otolaryngology (346)
  • Pain Medicine (323)
  • Palliative Medicine (90)
  • Pathology (537)
  • Pediatrics (1297)
  • Pharmacology and Therapeutics (548)
  • Primary Care Research (554)
  • Psychiatry and Clinical Psychology (4189)
  • Public and Global Health (7482)
  • Radiology and Imaging (1700)
  • Rehabilitation Medicine and Physical Therapy (1010)
  • Respiratory Medicine (979)
  • Rheumatology (478)
  • Sexual and Reproductive Health (493)
  • Sports Medicine (424)
  • Surgery (545)
  • Toxicology (71)
  • Transplantation (235)
  • Urology (203)