Abstract
Semicontinuous data, characterized by an excess of zeros followed by a non-negative and right-skewed distribution, are frequently observed in biomedical research. Different statistical models have been proposed to investigate the association of covariates with such outcome. Motivated by the search of genetic factors associated with Neutrophil Extracellular Traps (NETs), a semicontinuous biomarker involved in thrombosis, we here investigated the impact of the selected model for semicontinuous traits in the context of a Genome Wide Association Study (GWAS). We compared three models that jointly model zero and positive values while allowing the estimation of a single association parameter of covariates with the global mean: Tobit, Negative Binomial and Compound Poisson-Gamma. We assessed the fit of these models to a sample of 657 participants of the FARIVE study measured for NETs plasma levels. For each of these three models, we performed a GWAS on NETs in FARIVE participants and results were compared. A simulation study was also conducted to evaluate the control of the type I error. Compound Poisson-Gamma and Negative Binomial models fitted NETs data observed in FARIVE better than the Tobit model. However, the Negative Binomial model suffered from an inflation of its type I error, attributable to extreme positive values of the NETs and low frequency variants. Conversely, the Compound Poisson-Gamma model was robust to both phenomena. Using the latter model, a GWAS identified a genome wide significant locus on chr21q21.3. The lead variant was rs57502213, a deletion of two nucleotides located ∼40kb upstream the non-coding RNA (miR155HG) hosting the miR-155 that was recently highlighted to have a role in NETs formation. This work indicates that the modeling strategy for a semicontinuous outcome in the framework of GWAS studies is crucial. The choice of the model should take into account the nature of the process generating zero values and the presence of extreme values. Our work also suggests that the Compound Poisson-Gamma model, while still marginally employed, can be a robust modeling strategy for GWAS analysis on a semicontinuous trait.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
GM and D-AT are supported by the EPIDEMIOM-VT Senior Chair from the University of Bordeaux initiative of excellence IdEX. The FARIVE study was supported by grants from the Fondation pour la Recherche Medicale, the Program Hospitalier de recherche Clinique (PHRC 20 002; PHRC2009 RENOVA-TV), the Fondation de France, and the Leducq Foundation. FARIVE genetic data were funded by the GENMED Laboratory of Excellence on Medical Genomics [ANR-10-LABX-0013], a research program managed by the National Research Agency (ANR) as part of the French Investment for the Future.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The FARIVE study was approved by the Comite consultatif de protection des personnes dans la recherche biomedicale (Project 2002-034).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors