Abstract
Objectives: Patient-directed SMART on FHIR lets registries acquire longitudinal electronic health record data, but the payload requires substantial engineering before use. We present Registry Forge, an open-source pipeline that converts it into research-ready outputs. Materials and Methods: Registry Forge decodes and parses mixed C-CDA, HTML, RTF, PDF, and FHIR inputs, joins records to a canonical patient identifier, and emits a browser-viewable dashboard, an OMOP CDM v5.4 data set, GA4GH Phenopackets v2, a code inventory, and regex extractions of disease-specific narrative content. Results: Applied to the ALS Research Collaborative Study (94 participants, 56 US health systems), it processed 22,686 source files and 1,791 FHIR Bundles (109,599 resources); only 15.0% of files were full C-CDA. Discussion: This pipeline generalizes to any registry acquiring data through patient-directed SMART on FHIR. Conclusion: Registry Forge closes the acquisition-to-analysis gap with no server infrastructure and is openly available.
Competing Interest Statement
The authors have declared no competing interest.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Institutional Review Board of Advarra gave ethical approval for this work. The ALS Research Collaborative Study operates under continuous Institutional Review Board approval from Advarra (IORG0000635) and is conducted under a single, ongoing protocol. The secondary data analysis activities described in this manuscript are within the approved protocol. No individual-level data are reproduced in this article; all figures depicting patient-level information are drawn from a fully synthetic demonstration cohort that contains no protected health information.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Source code for the complete pipeline, the browser dashboard, and a fully synthetic single-patient demonstration cohort are publicly available at https://alstdi.github.io/RegistryForgeALS/ (repository: https://github.com/alstdi/RegistryForgeALS) under the MIT License. The exact release used for this manuscript is archived with a citable DOI (https://doi.org/10.71944/2P5C-NG50) through ALS TDI's DataCite membership. The clinical data described in this manuscript cannot be openly shared due to the terms of the ALS Research Collaborative Study informed consent. A de-identified version of the ALS Research Collaborative Study data are available to qualified researchers through the ARC Data Commons (https://www.als.net/arc/data-commons/).





