PT - JOURNAL ARTICLE AU - Saptarshi Bej AU - Jit Sarkar AU - Saikat Biswas AU - Pabitra Mitra AU - Partha Chakrabarti AU - Olaf Wolkenhauer TI - Prevalence of Non-obese Type 2 Diabetes in economically disadvantaged Indian rural populations AID - 10.1101/2020.09.21.20198598 DP - 2020 Jan 01 TA - medRxiv PG - 2020.09.21.20198598 4099 - http://medrxiv.org/content/early/2020/10/18/2020.09.21.20198598.short 4100 - http://medrxiv.org/content/early/2020/10/18/2020.09.21.20198598.full AB - Background Studies on Type 2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, identification of subpopulations in epidemiological datasets remain unexplored. We here focus on the detection of T2DM clusters in epidemiological data, specifically analysing the National Family Health Survey-4 (NFHS-4) dataset containing a wide spectrum of features, including medical history, dietary and addiction habits, socio-economic and lifestyle patterns of 10,125 T2DM patients.Methods Epidemiological data provide challenges for analysis due to the diverse types of features in it. In this case, applying the state-of-the-art dimension reduction tool UMAP conventionally was found to be ineffective for the NFHS-4 dataset, which contains continuous, ordinal and nominal feature types. Continuous features, although smaller in numbers, had an overpowering effect on the distribution of clusters. We implemented a distributed clustering workflow combining different similarity measure settings of UMAP, for clustering continuous, ordinal and nominal features separately. We integrated the reduced dimensions from each feature-type-distributed clustering to obtain interpretable and unbiased clustering of the data.Findings From a methodological perspective, we show that for diverse data types, frequent in epidemiological datasets, feature-type-distributed clustering using UMAP is effective as opposed to the conventional use of the UMAP algorithm. Application of UMAP based clustering workflow for this type of dataset is novel in itself.Our analysis reveals four significant clusters, with two of them comprising mainly of non-obese T2DM patients. These non-obese clusters has lower mean age and majorly comprises of rural residents. Surprisingly, one of the obese clusters had 90% of the T2DM patients practising non-vegetarian diet though they did not show an increased intake of plant-based protein-rich foods.Interpretation Our findings demonstrate the presence of a heterogeneity among T2DM patients with regard to socio-demography and dietary pattern. From our analysis, we conclude that, existence of significant non-obese T2DM subpopulations characterized by younger age group and economic disadvantage, raise the need of different screening criteria for T2DM among rural Indian residents.Funding This work was in part supported by funds from Bioinformatics Infrastructure (de.NBI) and Establishment of Systems Medicine Consortium in Germany e:Med, as well as the German Federal Ministry for Education and Research (BMBF) programs (FKZ 01ZX1709C). The work has also been funded and supported by the Indian Council of Medical research (ICMR) (No.3/1/3/JRF-2017/HRD-LS/56429/54).Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was in part supported by funds from Bioinformatics Infrastructure (de.NBI) and Establishment of Systems Medicine Consortium in Germany e:Med, as well as the German Federal Ministry for Education and Research (BMBF) programs (FKZ 01ZX1709C). JS received a research fellowship from Indian Council of Medical research (ICMR) (No.3/1/3/JRF-2017/HRD-LS/56429/54).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Not applicableAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesWe support the idea of transparency and reproducibility of research. Therefore, all data relevant to this work are made publicly available on a GitHub repository. https://github.com/Saptarshi-Bej/Type-2-Diabetes-Mellitus-T2DM-/blob/master/Preprocessed_DM_xx.zip