Abstract
Objectives Physical food outlets are increasingly offering delivery through Online Food Delivery Service (OFDS) platforms, but the scale of this expansion remains unclear due to the labour-intensive process of manually matching outlets to online platforms. Understanding the share of outlets offering delivery is important, as it impacts food availability and thus potentially influences dietary behaviours. This paper demonstrates how a machine learning model can efficiently match physical to online outlets. We also analysed how the proportion of physical outlets listed online and online-only outlets varies by area-level deprivation.
Methods The physical locations of outlets selling food in Great Britain was obtained from a centrally held register for food hygiene data, while online outlet data was collected through web scraping an OFDS platform. We calculated string distances based on outlet names and postcodes, which were then used to train a Random Forest model to match outlets from the two lists. Area-level deprivation was assessed using the Index of Multiple Deprivation.
Results The Random Forest classifier model achieved an F1 score of 90%, a recall of 98%, and a precision of 83%. Overall, the median percentage of physical outlets also listed online was 14% (IQR 0 - 23), and the median percentage of online-only outlets was also 14% (IQR 0 - 27). The proportion of physical outlets listed online and online-only outlets was highest in more deprived areas. For example, compared to the least deprived areas, the most deprived areas were associated with a 6% greater proportion of physical food outlets listed online (95%CI 5%–6%) and a 3% greater proportion of online-only outlets (95%CI 1%–4%).
Conclusion This study demonstrates the potential of machine learning techniques to efficiently match physical and online food outlets. This automated approach can provide insights into the relationship between physical and online food availability. Researchers and policymakers can use this method to better understand inequalities in food outlet availability and monitor the expansion of online delivery services.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
JCH and JA were supported by the Medical Research Council [Unit Programme number MC_UU_00006/7]. The funders played no role in the design of the study, the collection, analysis, and interpretation of data, or the writing of the manuscript.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Food Standards Agency data is publicly available. Code to webscrape data from Online Food Delivery Service platforms can be found online.