Abstract
Statistical network analysis plays a critical role in managing the coronavirus disease (COVID-19) infodemic such as addressing community detection and rumor source detection problems in social networks. As the data underlying infodemiology are fundamentally huge graphs and statistical in nature, there are computational challenges to the design of graph algorithms and algorithmic speedup. A framework that leverages cloud computing is key to designing scalable data analytics for infodemic control. This paper proposes the MEGA framework, which is a novel joint hierarchical clustering and parallel computing technique that can be used to process a variety of computational tasks in large graphs. Its unique feature lies in using statistical machine learning to exploit the inherent statistics of data to accelerate computation. Our MEGA framework consists of first pruning, followed by hierarchical clustering based on geodesic distance and then parallel computing, lending itself readily to parallel computing software, e.g., MapReduce or Hadoop. In particular, we illustrate how our MEGA framework computes two representative graph problems for infodemic control, namely network motif counting for community detection and network centrality computation for rumor source detection. Interesting special cases of optimal tuning in the MEGA framework are identified based on geodesic distance characterization and random graph model analysis. Finally, we evaluate its performance using cloud software implementation and real-world graph datasets to demonstrate its computational efficiency over existing state of the art.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
No external funding was received.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
N.A.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
cnhang3-c{at}my.cityu.edu.hk, peiduoyu{at}cycu.edu.tw, linling2-c{at}my.cityu.edu.hk, cheewtan{at}cityu.edu.hk
Changed the datasets (to social networks) so that the experiments are more related to infodemic.
Data Availability
We use datasets provided by Stanford Large Network Dataset Collection (SNAP), the 10th DIMACS Implementation Challenge, and KONECT for numerical experiments.