Abstract
Integrating the cellular resolution of single-cell RNA sequencing (scRNA-seq) with the phenotypic depth of population-scale biobanks is essential for elucidating the cellular basis of complex diseases. However, this integration is often hindered by the limited sample sizes of scRNA-seq cohorts and the lack of cell-type resolution in massive biobank datasets.
We present CausalCellInfer, a scalable computational framework designed to bring cellular resolution to bulk and genotype-imputed transcriptomes. CausalCellInfer utilizes an invariant causal prediction-inspired procedure (scI-GCM) to identify environment-stable marker genes, employs a parsimonious deep neural network for robust cell-fraction deconvolution, and leverages regularized matrix completion to reconstruct cell-type-specific (CTS) expression profiles. This architecture is specifically optimized for biobank-scale data, where technical heterogeneity and limited gene overlap are prevalent.
Validated across simulated data, pseudo-bulk mixtures, and real PBMC datasets, CausalCellInfer demonstrated superior accuracy and computational efficiency compared to existing methods. Applied to ∼500,000 UK Biobank participants, the framework enabled cell-resolved analyses for 29 traits, identifying known pathological shifts, such as reduced pancreatic β-cell proportions in Type 2 Diabetes, and uncovering novel biological signals, including disrupted excitatory neuron and oligodendrocyte interactions in depression. Furthermore, inferred CTS differential expression patterns showed significant concordance with independent single-cell studies and were enriched for OpenTargets disease genes. Overall, CausalCellInfer bridges the gap between single-cell insights and population-scale genomics, providing a powerful tool for systematic discovery of disease mechanisms at cellular resolution.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported partially by the Lo Kwee Seong Biomedical Research Fund from The Chinese University of Hong Kong and the KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Institute of Zoology and The Chinese University of Hong Kong, China.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
North West - Haydock Research Ethics Committee gave ethical approval for this work
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
I have updated the introduction, methods, and results sections. Some content in the method section has been moved to the supplementary text. I have deleted one supplementary table.
Data availability
UK biobank data is available to any researchers who formally apply for the data. However, the data is not publicly available due to privacy concerns. Reference scRNA-seq datasets are publicly available at the following sites: frontal cortex: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE144136 adipose tissues: https://singlecell.broadinstitute.org/single_cell/study/SCP1376/a-single-cell-atlas-of-human-and-mouse-white-adipose-tissue?cluster=Human%20WAT&spatialGroups=--&annotation=fat_type--group--study&subsample=100000#study-summary pancreas: https://hpap.pmacs.upenn.edu/





