Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Leveraging sequences missing from the human genome to diagnose cancer

Ilias Georgakopoulos-Soares, Ofer Yizhar Barnea, Ioannis Mouratidis, Candace S.Y. Chan, Rachael Bradley, Mayank Mahajan, Jasmine Sims, Dianne Laboy Cintron, Ryder Easterlin, Julia S. Kim, Emmalyn Chen, Geovanni Pineda, Guillermo E. Parada, John S. Witte, Christopher A. Maher, Felix Feng, Ioannis Vathiotis, Nikolaos Syrigos, Emmanouil Panagiotou, Andriani Charpidou, Konstantinos Syrigos, Jocelyn Chapman, Mark Kvale, Martin Hemberg, Nadav Ahituv
doi: https://doi.org/10.1101/2021.08.15.21261805
Ilias Georgakopoulos-Soares
1Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
2Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
3Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ofer Yizhar Barnea
1Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
2Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ioannis Mouratidis
4Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Candace S.Y. Chan
1Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
2Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rachael Bradley
1Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
2Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mayank Mahajan
5Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women’s Hospital, Boston, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jasmine Sims
1Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
2Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Dianne Laboy Cintron
1Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
2Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ryder Easterlin
1Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
2Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Julia S. Kim
1Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
2Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Emmalyn Chen
6Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Geovanni Pineda
7Division of Gynecologic Oncology, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Guillermo E. Parada
8Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John S. Witte
6Department of Epidemiology and Biostatistics, University of California San Francisco, San Francisco, California, USA
9Department of Epidemiology and Population Health, Stanford University, Stanford, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christopher A. Maher
10Division of Oncology, Department of Medicine, Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri
11Siteman Cancer Center, Washington University School of Medicine, St. Louis, Missouri
12Department of Biomedical Engineering, Washington University School of Medicine, St. Louis, Missouri
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Felix Feng
13Division of Hematology/Oncology, Department of Medicine, University of California San Francisco, San Francisco, California, USA
14Helen Diller Comprehensive Cancer Center, University of California San Francisco, San Francisco, California, USA
15Department of Radiation Oncology, University of California San Francisco, San Francisco, California, USA
16Department of Urology, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ioannis Vathiotis
17Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
18Department of Pathology, Yale School of Medicine, New Haven, CT, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Nikolaos Syrigos
17Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
19Breast Oncology, Dana-Farber Brigham Cancer Center, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Emmanouil Panagiotou
17Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andriani Charpidou
17Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Konstantinos Syrigos
17Third Department of Internal Medicine, Sotiria Hospital, National and Kapodistrian University of Athens, School of Medicine, Athens, Greece
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jocelyn Chapman
14Helen Diller Comprehensive Cancer Center, University of California San Francisco, San Francisco, California, USA
16Department of Urology, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mark Kvale
2Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martin Hemberg
5Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women’s Hospital, Boston, USA
20Wellcome Sanger Institute, Hinxton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: mhemberg{at}bwh.harvard.edu nadav.ahituv{at}ucsf.edu
Nadav Ahituv
1Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA
2Institute for Human Genetics, University of California San Francisco, San Francisco, California, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: mhemberg{at}bwh.harvard.edu nadav.ahituv{at}ucsf.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Cancer diagnosis using cell-free DNA (cfDNA) has potential to improve treatment and survival but has several technical limitations. Here, we show that tumor-associated mutations create neomers, DNA sequences 13-17 nucleotides in length that are predominantly absent from genomes of healthy individuals, that can accurately detect cancer, including early stages, and distinguish subtypes and features. Using a neomer-based classifier, we show that we can distinguish twenty-one different tumor-types with higher accuracy than state-of-the-art methods. Refinement of this classifier using a handcrafted set of kmers identified additional cancer features with greater precision. Generation and analysis of 451 cfDNA whole-genome sequences demonstrates that neomers can precisely detect lung and ovarian cancer with an area under the curve (AUC) of 0.93 and 0.89, respectively. In particular, for early stages, we show that neomers can detect lung cancer with an AUC of 0.94 and ovarian cancer, which lacks an early detection test, with an AUC of 0.93. Finally, testing over 9,000 sequences with either promoter or massively parallel reporter assays, we show that neomers can identify cancer-associated mutations that alter regulatory activity. Combined, our results identify a novel, sensitive, specific and simple diagnostic tool that can also identify novel cancer-associated mutations in gene regulatory elements.

Competing Interest Statement

I.G.S., O.Y.B., I.M., M.H. and N.A. are co-founders of Neomer Diagnostics and have filed patent applications covering embodiments and concepts disclosed in the manuscript.

Funding Statement

This work was supported in part by the Benioff Initiative for Prostate Cancer Research, the UCSF Catalyst award, the UCSF Innovations Ventures Philanthropy Fund and National Human Genome Research Institute grant number UM1HG011966 (N.A). MH was supported by core funding from the Wellcome Trust and core funding from the Evergrande Center.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

All data for this study is deidentified and publicly available. This includes: TCGA data: publicly available mutation calls were obtained at: https://portal.gdc.cancer.gov/ Permission to use the data from https://ega-archive.org/studies/EGAS00001003206 was obtained from the Data Access Committee after contacting Dr Ellen Heitzer whose study was approved by the Ethics Committee of the Medical University of Graz (approval number 21-228 ex 09/10 [prostate cancer] and 29-272 ex 16/17 [high-resolution analysis of plasma DNA]), conducted according to the Declaration of Helsinki and written informed consent was obtained from all patients and healthy probands, respectively.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • We have added the following to the revision: 1)Generation and analysis of 451 cell free DNA whole-genome sequences using a neomer classifier. 2)Massively parallel reporter assays for over 9,000 sequences showing that neomers affect regulatory activity.

Data Availability

For the TCGA data, mutation calls are publicly available in: https://portal.gdc.cancer.gov/ Permission to use the data from https://ega-archive.org/studies/EGAS00001003206 was obtained from the DAC after contacting Dr Ellen Heitzer. WGS sequencing data was submitted to dbGAP. MPRA sequencing data was deposited in the NCBI short read archive (SRA) as Bioproject PRJNA917083.

https://portal.gdc.cancer.gov/

https://ega-archive.org/studies/EGAS00001003206

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted January 17, 2023.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Leveraging sequences missing from the human genome to diagnose cancer
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Leveraging sequences missing from the human genome to diagnose cancer
Ilias Georgakopoulos-Soares, Ofer Yizhar Barnea, Ioannis Mouratidis, Candace S.Y. Chan, Rachael Bradley, Mayank Mahajan, Jasmine Sims, Dianne Laboy Cintron, Ryder Easterlin, Julia S. Kim, Emmalyn Chen, Geovanni Pineda, Guillermo E. Parada, John S. Witte, Christopher A. Maher, Felix Feng, Ioannis Vathiotis, Nikolaos Syrigos, Emmanouil Panagiotou, Andriani Charpidou, Konstantinos Syrigos, Jocelyn Chapman, Mark Kvale, Martin Hemberg, Nadav Ahituv
medRxiv 2021.08.15.21261805; doi: https://doi.org/10.1101/2021.08.15.21261805
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Leveraging sequences missing from the human genome to diagnose cancer
Ilias Georgakopoulos-Soares, Ofer Yizhar Barnea, Ioannis Mouratidis, Candace S.Y. Chan, Rachael Bradley, Mayank Mahajan, Jasmine Sims, Dianne Laboy Cintron, Ryder Easterlin, Julia S. Kim, Emmalyn Chen, Geovanni Pineda, Guillermo E. Parada, John S. Witte, Christopher A. Maher, Felix Feng, Ioannis Vathiotis, Nikolaos Syrigos, Emmanouil Panagiotou, Andriani Charpidou, Konstantinos Syrigos, Jocelyn Chapman, Mark Kvale, Martin Hemberg, Nadav Ahituv
medRxiv 2021.08.15.21261805; doi: https://doi.org/10.1101/2021.08.15.21261805

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Oncology
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (867)
  • Anesthesia (306)
  • Cardiovascular Medicine (4480)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (614)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15276)
  • Forensic Medicine (31)
  • Gastroenterology (1133)
  • Genetic and Genomic Medicine (6643)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4603)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1622)
  • Hematology (544)
  • HIV/AIDS (1275)
  • Infectious Diseases (except HIV/AIDS) (15960)
  • Intensive Care and Critical Care Medicine (1110)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6693)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1152)
  • Occupational and Environmental Health (961)
  • Oncology (3369)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (668)
  • Pediatrics (1703)
  • Pharmacology and Therapeutics (699)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5494)
  • Public and Global Health (9284)
  • Radiology and Imaging (2223)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1201)
  • Rheumatology (598)
  • Sexual and Reproductive Health (720)
  • Sports Medicine (535)
  • Surgery (720)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (266)