PT - JOURNAL ARTICLE AU - Malia Morrison AU - Crista E. Johnson-Agbakwu AU - Celeste Bailey AU - Li Liu TI - Classify Refugee Status Using Common Features in EMR AID - 10.1101/2021.08.17.21262048 DP - 2021 Jan 01 TA - medRxiv PG - 2021.08.17.21262048 4099 - http://medrxiv.org/content/early/2021/08/24/2021.08.17.21262048.short 4100 - http://medrxiv.org/content/early/2021/08/24/2021.08.17.21262048.full AB - Objective Automated and accurate identification of refugees in healthcare databases is a critical first step to investigate healthcare needs of this vulnerable population and improve health disparities. This study developed a machine-learning method, named refugee identification system (RIS) that uses features commonly collected in healthcare databases to classify refugees and non-refugees.Materials and Methods We compiled a curated data set consisting of 103 refugees and 930 non-refugees in Arizona. For each person in the curated data set, we collected age, primary language, and home address. We supplemented individual-level data with state-level refugee resettlement statistics and world language statistics, then performed feature engineering to convert primary language and home address into quantitative features. Finally, we built a random forest model to classify refugee status.Results Evaluated on holdout testing data, RIS achieved a high classification accuracy of 0.97, specificity of 0.98, sensitivity of 0.88, positive predictive value of 0.83, and negative predictive value of 0.99. The receiver operating characteristic curve had an area under the curve value of 0.96.Discussion and Conclusion RIS is an automated, accurate, generalizable, and scalable method that can be used to identify refugees in healthcare databases. It enables large-scale investigation of refugee healthcare needs and improvement of health disparities.Competing Interest StatementThe authors have declared no competing interest.Funding StatementNo external funding.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This study is proved by the IRB at Valleywise Health.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesData are available upon request.