PT - JOURNAL ARTICLE AU - Serge Dolgikh TI - Analysis and Augmentation of Small Datasets with Unsupervised Machine Learning AID - 10.1101/2021.04.21.21254796 DP - 2021 Jan 01 TA - medRxiv PG - 2021.04.21.21254796 4099 - http://medrxiv.org/content/early/2021/04/25/2021.04.21.21254796.short 4100 - http://medrxiv.org/content/early/2021/04/25/2021.04.21.21254796.full AB - Analysis of small datasets presents a number of essential challenges not in the least due to insufficient sampling of characteristic patterns in the data making confident conclusions about the unknown distribution elusive and resulting in lower statistical confidence and higher error. In this work, a novel approach to augmentation of small datasets is proposed based on an ensemble of neural network models of unsupervised generative self-learning. Applying generative learning with an ensemble of individual models allowed to identify stable clusters of data points in the latent representations of the observable data. Several techniques of augmentation based on identified latent cluster structure were applied to produce new data points and enhance the dataset. The proposed method can be used with small and extremely small datasets to identify characteristics patterns, augment data and in some cases, improve accuracy of classification in the scenarios with strong deficit of labels.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis research received no specific fundingAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Only data from open and publicly available sources, that required no registration and/or authorization was used.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData used in the study is available upon request https://www.medrxiv.org/content/10.1101/2020.05.17.20104661v2 https://www.google.com/covid19-map/ https://www.worldometers.info/