Multiscale statistical physics of the Human-SARS-CoV-2 interactome

Protein-protein interaction (PPI) networks have been used to investigate the influence of SARS-CoV-2 viral proteins on the function of human cells, laying out a deeper understanding of COVID--19 and providing ground for drug repurposing strategies. However, our knowledge of (dis)similarities between this one and other viral agents is still very limited. Here we compare the novel coronavirus PPI network against 45 known viruses, from the perspective of statistical physics. Our results show that classic analysis such as percolation is not sensitive to the distinguishing features of viruses, whereas the analysis of biochemical spreading patterns allows us to meaningfully categorize the viruses and quantitatively compare their impact on human proteins. Remarkably, when Gibbsian-like density matrices are used to represent each system's state, the corresponding macroscopic statistical properties measured by the spectral entropy reveals the existence of clusters of viruses at multiple scales. Overall, our results indicate that SARS-CoV-2 exhibits similarities to viruses like SARS-CoV and Influenza A at small scales, while at larger scales it exhibits more similarities to viruses such as HIV1 and HTLV1.

The COVID-19 pandemic, with global impact on multiple crucial aspects of human life, is still a public health threat in most areas of the world. Despite the ongoing investigations aiming to find a viable cure, our knowledge of the nature of disease is still limited, especially regarding the similarities and differences it has with other viral infections. On the one hand, SARS-CoV-2 shows high genetic similarity to SARS-CoV 1  With the rise of network medicine [6][7][8][9][10][11] , methods developed for complex networks analysis have been widely adopted to efficiently investigate the interdependence among genes, proteins, biological processes, diseases and drugs 12 . Similarly, they have been used for characterizing the interactions between viral and human proteins in case of SARS-CoV-2 [13][14][15] , providing insights into the structure and function of the virus 16 and identifying drug repurposing strategies 17,18 .
However, a comprehensive comparison of SARS-CoV-2 against other viruses, from the perspective of network science, is still missing.
Here, we use statistical physics to analyze 45 viruses, including SARS-CoV-2. We consider the virus-human protein-protein interactions (PPI) as an interdependent system with two parts, human PPI network targeted by viral proteins. In fact, due to the large size of human PPI network, its structural properties barely change after being merged with viral components. Consequently, we show that percolation analysis of such interdependent systems provides no information about the distinguishing features of viruses. Instead, we model the propagation of perturbations from viral nodes through the whole system, using bio-chemical and regulatory dynamics, to obtain the spreading patterns and compare the average impact of viruses on human proteins. Finally, we exploit Gibbsian-like density matrices, recently introduced to map network states, to quantify the impact of viruses on the macroscopic functions of human PPI network, such as von Neumann entropy. The inverse temperature β is used as a resolution parameter to perform a multiscale analysis. We use the above information to cluster together viruses and our findings indicate that SARS-CoV-2 groups with a number of pathogens associated with respiratory infections, including SARS-CoV, Influenza A and Human Adenovirus (HAdV) at the smallest scales, more influenced by local topological features. Interestingly, at larger scales, it exhibits more similarity with viruses from distant families such as HIV1 and Human T-cell Leukemia Virus type 1 (HTLV1).
Our results shed light on the unexplored aspects of SARS-CoV-2, from the perspective of statistical physics of complex networks, and the presented framework opens the doors for further theoretical developments aiming to characterize structure and dynamics of virus-host interactions, as well as grounds for further experimental investigation and potentially novel clinical treatments.

Results
Here, we use data regarding the viral proteins and their interactions with human proteins for 45 viruses (see Methods and Fig. 1). To obtain the virus-human interactomes, we link the data to the BIOSTR Human PPI network (19,  Percolation of the interactomes. Arguably, the simplest conceptual framework to assess how and why a networked system loses its functionality is via the process of percolation 19 . Here, the structure of interconnected systems is modeled by a network G with N nodes, which can be fully represented by an adjacency matrix A (A ij = 1 if nodes i and j are connected, it is 0 oth-   20 . This point of view assumes that, as a first approximation, there is an intrinsic relation between connectivity and functionality: when the node removal occurs, the more capable of remaining assembled a system is, the better it will perform its tasks. Hence, we have a quantitative way to assess the robustness of the system. If one wants to single out the role played by a certain property of the system, instead of selecting the nodes randomly, they can be sequentially removed following that criteria. For instance, if we want to find out what is the relevance of the most connected elements on the functionality, we can remove a fraction of the nodes with largest degree 21,22 . Technically, the criteria can be whatever metric that allows us to rank nodes, although in practical terms topologically-oriented protocols are the most frequently used due to their accessibility, such as degree, betweenness, etc. Therefore percolation is, at all effects, a topological analysis, since its input and output are based on structural information. In the past, the usage of percolation has been proved useful to shed light on several aspects of protein-related networks, such as in the identification of functional clusters 23 and protein complexes 24 , the verification of the quality of functional annotations 25 or the critical properties as a function of mutation and duplication rates 26 , to name but a few. Following this research line, we perform the percolation analysis to all the PPI networks to understand if this technique brings any information that allows us to differentiate among viruses. The considered protocols are the random selection of nodes, the targeting of nodes by degree -i.e., the number of connections they haveand their removal by betweenness centrality -i.e., a measure of the likelihood of a node to be in the information flow exchanged through the system by means of shortest paths. We apply these attack strategies and compute the resulting (normalized) size of the largest connected component S in the network, which serves as a proxy to the remaining functional part, as commented above.
This way, when S is close to unity the function of the network has been scarcely impacted by the intervention, while when S is close to 0 the network can no longer be operative. The results are shown in Fig. 3. Surprisingly, for each attacking protocol, we observe that the curves of the size of the largest connected component neatly collapse in a common curve. In other words, percolation analysis completely fails at finding virus-specific discriminators. Viruses do respond differently depending on the ranking used, but this is somehow expected due to the correlation between the metrics employed and the position of the nodes in the network.
We can shed some light on the similar virus-wise response to percolation by looking at topological structure of the interactomes. Despite being viruses of diverse nature and causing such different symptomatology, their overall structure shows a high level of similarity when it comes to the protein-protein interaction. Indeed, for every pair of viruses we find the fraction of nodes f N and fraction of links f L that simultaneously participate in both. Averaging over all pairs, we obtain that f N = 0.9996 ± 0.0002 and f L = 0.9998 ± 0.0007. That means that the interactomes are structurally very similar, so the dismantling ranks. If purely topological analysis is not able to differentiate between viruses, then we need more convoluted, non-standard techniques to tackle this problem. In the next sections we will employ these alternative approaches.
Analysis of perturbation propagation. PPI networks represent the large scale set of interacting proteins. In the context of regulatory networks, edges encode dependencies for activation/inhibition with transcription factors. PPI edges can also represent the propensity for pairwise binding and the formation of complexes. The analytical treatment of these processes is described via Bio-Chemical dynamics 27,28 and Regulatory dynamics 29 . In Bio-Chemical (Bio-Chem) dynamics, these interactions are proportional to the product of concentrations of reactants, thus resulting in a second-order interaction, forming dimers. Protein concentration X i (i = 1, 2, ..., N ) is also dependent on its degradation rate B i and the amount of protein synthesized at a rate F i .
The resulting Law of Mass Action: A ij x i x j summarizes the formation of complexes and degradation/synthesis processes that occur in a PPI. Regulatory dynamics can be instead characterized by an interaction with neighbors described by a Hill function that saturates at unity: In the context of the study of signal propagation, recent works have introduced the definition of network Global Correlation Function 30, 31 as Ultimately, the idea is that constant perturbation brings the system to a new steady state x i → x i + dx i , and dx i /x i quantifies the magnitude of the response of node i from the perturbation in j. This allows also the definition of measures such as Impact 31 of a node as I i = j A ij G T ij describing the response of i's neighbors to its perturbation. Interestingly, it was found that these measures can be described with power laws of degrees (I i ≈ k φ i ), via universal exponents dependent on the dynamics underlying ODEs allowing to effectively describe the interplay between topology and dynamics. In our case, φ = 0 for both processes, therefore the perturbation from i has the same impact on neighbors, regardless of its degree. We exploit the definition of G ij to define the vector G v of perturbations of concentrations induced by the interaction with the virus v, where the k-th entry is given by 31 The steps we follow to asses the impact of the viral nodes in the human interactome via the microscopic dynamics are described next. We first obtain the equilibrium states of human interactome by numerical integration of equations. Then, for each virus, we compute the system response from perturbations starting in ∀i ∈ V which is eventually encoded in G v . Finally, we repeat these steps for both the Bio-Chem and M-M models. The amount of correlation generated is a measure of the impact of the virus on the interactome equilibrium state. We estimate it as the Euclidean 1-norm of the correlation vectors G v 1 = i |G v i |, which we refer to as Cumulative Correlation. The results are presented in Fig. 4.
By allowing for multiple sources of perturbation, the biggest responses in magnitude will come from direct neighbors of these sources, making them the dominant contributors to G v 1 .
With I i not being dependent on the source degree, these results support the idea that with these specific forms of dynamical processes on the top of the interactome, the overall impact of a perturbation generated by a virus is proportional to the amount of human proteins it interacts with.
Results shown in Fig. 5 highlight that propagation patterns strongly depend on the sources (i.e., the affected nodes V), and strong similarities will generally be found within the same family and for viruses that share common impacted proteins in the interactome. Conversely, families and viruses with small (or null) overlap in the sources exhibit low similarity and are not sharply distinguishable. To cope with this, we adopt a rather macroscopic view of the interactomes in the next section.
Analysis of spectral information. We have shown that the structural properties of human PPI network does not significantly change after being targeted by viruses. Percolation analysis seems ineffective in distinguishing the specific characteristics of virus-host interactomes while, in contrast, the propagation of biochemical signals from viral components into human PPI network has been shown successful in assessing the viruses in terms of their average impact on human proteins. Remarkably, the propagation patterns can be used to hierarchically cluster the viruses, although some of them are highly dependent on the choice of threshold (Fig. 5). In this section, which is defined in terms of the propagator of a diffusion process on top of the network, normalized by the partition function Z(β, G) = Tr e −βL , which has an elegant physical meaning in terms of dynamical trapping for diffusive flows 38 . Consequently, the counterpart of Massieu functionalso known as free entropy -in statistical physics can be defined for networks as Note that a low value of the Massieu function indicates high information flow between the nodes.
The von Neumann entropy can be directly derived from the Massieu function by encoding the information content of graph G. Finally, the difference between von Neumann entropy and the Massieu function follows where U(β, G) is the counterpart of internal energy in statistical physics. In the following, we use the above quantities to compare the interactomes corresponding to different virus-host interactomes. In fact, as the number of viral nodes is much smaller than the number of human proteins, we model each virus-human interdependent system G as a perturbation of the large human PPI network G (See Fig. 6).
After considering the viral perturbations, the von Neumann entropy, Massieu function and the energy of the human PPI network change slightly. The magnitude of such perturbations can be calculated as explained in Fig. 6, for von Neumann entropy and Massieu function, while the perturbation in internal energy follows their difference βδU(β, G) = δS(β, G) − δΦ(β, G), according to Eq. 7. The parameter β encodes the propagation time in diffusion dynamics, or equivalently an inverse temperature from a thermodynamic perspective, and is used as a resolution parameter tuned to characterize macroscopic perturbations due to node-node interactions at different scales, from short to long range 40 .
Based on the perturbation values and using k-means algorithm, a widely adopted clustering technique, we group the viruses together (see Fig. 6, Tab. 1 and Tab. 2). At small scales, SARS- CoV-2 appears in a cluster with a number of other viruses causing respiratory illness, including SARS-CoV, Influenza A and HAdV. However, at larger scales, it exhibits more similarity with HIV1, HTLV1 and HPV type 16.  Table 1: The summary of clustering results at small scales (β ≈ 1 from Fig.6) is presented.
Remarkably, at this scale, SARS-CoV-2 groups with a number of respiratory diseases including SARS-CoV, Influenza A and HAdV.   Fig.6) is presented. Here, SARS-CoV-2 shows higher similarity to HIV1, HTLV1 and HPV type 16.

Discussion
Comparing COVID-19 against other viral infections is still a challenge. In fact, various approaches can be adopted to characterize and categorize the complex nature of viruses and their impact on human cells.
In this study, we used an approach based on statistical physics to analyze virus-human

Methods
Overview of the data set. It is worth noting that to build the COVID-19 virus-host interactions, a different procedure had to be used. In fact, since the SARS-CoV-2 is too novel we could not find its PPI in the STRING repository and we have considered, instead, the targets experimentally observed in Gordon et al 13 , consisting of 332 human proteins. The remainder of the procedure used to build the virus-host PPI is the same as before. See Fig. 1 for summary information about each virus. a key enzyme involved in the process of prostaglandin biosynthesis; IFIH1 (Interferon Induced with Helicase C domain 1, NCBI Gene ID: 64135), encoding MDA5, an intracellular sensor of viral RNA responsible for triggering the innate immune response: it is fundamental for activating the process of pro-inflammatory response that includes interferons, for this reason it is targeted by several virus families which are able to hinder the innate immune response by evading its specific interferon response.
Contributions. AG, OA and SB performed numerical experiments and data analysis. MDD conceived and designed the study. All authors wrote the manuscript.