Main

RNA viruses have become an important area of study for epidemiologists and evolutionary biologists alike1,2,3. Much of this research is centred on two main themes; understanding the mechanisms of RNA virus evolution, often through experimental analyses, and reconstruction of the epidemiological history of a given virus, namely its origin and spread through populations and the forces that promote its emergence. Here, we review the progress and current state of both these research topics.

Mechanisms of viral evolution

Processes of evolutionary change. Central to population genetics is understanding how the five main forces of evolutionary change — mutation, recombination, natural selection, GENETIC DRIFT and migration — interact to shape the genetic structure of populations. These same forces are also central to understanding RNA virus evolution, although their relative strengths differ to those observed for DNA-based organisms.

For RNA viruses, most attention has been directed towards mutation, selection and genetic drift. We can understand their importance and interaction by considering four basic properties of RNA virus populations. First, RNA viruses often have very large population sizes, such that the number of viral particles in a given organism might be as high as 1012. Second, such immense population sizes, which are several orders of magnitude larger than those observed for cellular organisms, are a product of explosive replication. For example, a single infectious particle can produce an average of 100,000 viral copies in 10 hours. As natural selection is most efficient with large populations, it is no surprise that experiments using RNA viruses have shown that selection is of fundamental importance in controlling their evolutionary dynamics, such that new mutants with increased FITNESS (as measured by their selection coefficient, s) continually appear and out-compete older, inferior alleles4. Third, owing to the lack of proofreading activity in their polymerase proteins, RNA viruses exhibit the highest mutation rates of any group of organisms, approximately one mutation per genome, per replication5,6. Finally, the genome sizes of RNA viruses are typically small, ranging from only 3 kb to 30 kb, with a median size of 9 kb. These last two properties are intimately related because high-mutation rates are theoretically expected to limit genome size. In particular, a mutation rate that exceeds a notional ERROR THRESHOLD (set at approximately the reciprocal of the genome size) generates so many deleterious mutations in each replication cycle that even the fittest viral genomes are unable to reproduce, and population size decreases to extinction7,8. However, RNA viruses that exist close to (but below) the error threshold are also able to produce many beneficial mutations in a short time, thereby enhancing adaptability, provided that their populations are sufficiently large.

In the simple situation outlined above, RNA viruses should evolve in a highly deterministic manner, with the process of natural selection working efficiently on a vast array of mutational variants. Although it is true that RNA virus populations are often highly diverse, this is not sufficient to explain the entirety of RNA virus evolution. In particular, deterministic approaches assume that population sizes are universally large, such that the fate of a given mutation can be predicted if its frequency and fitness are known. Although the population sizes of RNA viruses are often very large, factors such as variation in replication potential among variants, differences in generation time among infected cells and POPULATION BOTTLENECKS, most notably during transmission between hosts, might lead to an EFFECTIVE POPULATION SIZE (denoted Ne) that is much smaller than the actual number of infected cells. Theory predicts that in populations where Ne is small (such that the compound parameter Nes < 1), genetic drift has an important role in determining the frequency and fate of mutations9.

Recombination might also have an important role in RNA virus evolution. Although most studies indicate that recombination rates in many RNA viruses are often lower than those in other organisms10, there are notable exceptions. Perhaps the most dramatic is HIV, in which the genomic recombination rate exceeds the genomic mutation rate11. Frequent recombination seems advantageous because it can create high fitness genotypes more rapidly than by mutation alone. Moreover, recombination might also purge deleterious mutations from virus populations, thereby preventing a dramatic decrease in fitness (see below). However, simulation studies have indicated that frequent recombination is more likely to reduce fitness when mutation rates are close to the error threshold12. Finally, recombination rates in RNA viruses might not be set by natural selection at all. Rather, they could simply be a passive function of the replication machinery or ecological circumstances of the virus in question. For example, recombination rates seem to be particularly low in negative-sense RNA viruses13, which might be a result of the RNA-packaging mechanism. Understanding the causes of variation in recombination rate among RNA viruses is a key area for future study.

A final factor to consider in RNA virus evolution is migration. Migration (also referred to as gene flow) must not only be understood at a macroscopic level (that is, among hosts within a population, among populations or between host species), but also within a single infected individual. From the site of inoculation, viruses can be transported to several tissues, generating intra-host spatial variation14. However, the effect of a non-uniform population distribution on the spread, fitness and variability of virus populations has been much less studied than other evolutionary factors, although in some experiments a positive correlation between migration rate and the average fitness of the population has been observed15.

The quasispecies as a model of RNA virus evolution. The remarkable mutational power of RNA viruses has meant that their evolution has often been considered to be different to DNA-based organisms1. Key to this is the concept of the quasispecies, which was first developed by Eigen and Schuster16 to understand the dynamics of primitive evolutionary systems. RNA viruses are of particular importance in this respect as they might represent biological entities that evolve according to the rules of quasispecies theory. The basis of quasispecies theory is the notion that the target of natural selection is not simply the fastest growing replicator, but rather a broad spectrum of mutants that are produced by erroneous copying of the fittest (or master) sequence16,17. Natural selection acts on the entire quasispecies because mutation rates close to the error threshold mean that individual viral genomes are linked by a mutational coupling — all the possible mutational links between viral genomes are established — so that the whole population evolves as a single unit. One particularly important implication of this special form of group selection is that the fastest replicating RNA viral genomes could be out-competed by those with lower replication rates if the latter have a high probability of being generated by mutation from closely related variants.

An important question is whether the quasispecies model is an accurate description of RNA virus evolution. Experimental evidence for the quasispecies was first reported for the bacteriophage Qβ18. Subsequent experiments with mammalian vesicular stomatitis virus (VSV) provided one of the most important supporting observations for the quasispecies — a high-fitness viral variant was suppressed by one of lower fitness19. However, this can also be explained by genetic drift; the probability that any variant achieves fixation in a population is partially dependent on its initial frequency, so most rare, albeit advantageous, variants are lost by drift in small populations. Indeed, a generic problem of quasispecies theory is that genetic drift is expected to be extremely restricted17, which might not be the case for viruses in Nature20,21. More recently, in vitro studies of the evolutionary dynamics of bacteriophage φ6 provided evidence for one aspect of quasispecies theory — that viral genomes differ in their mutational spectra and that this affects fitness22. However, because these experiments used small populations and RUGGED FITNESS LANDSCAPES, and because little is known about fitness landscapes in Nature, the generality of these results in uncertain.

Although the quasispecies has a firm theoretical foundation, and there is some evidence for it in laboratory populations, whether it applies to RNA viruses in nature is less clear. For example, simple observations of high levels of genetic variation in RNA viruses are not sufficient to prove the existence of quasispecies, although this is often the only evidence presented, nor is the existence of an error threshold, which can easily be explained using evolutionary models8. Rather, to demonstrate that RNA viruses form quasispecies it is necessary to show that natural selection acts on viral populations as a unit. Testing this prediction in Nature will be one of the most important future areas of study for those investigating the mechanics of viral evolution.

Experimental evolution of RNA viruses

Experimental evolution constitutes a powerful tool for simulating natural evolution23 and is frequently used to test the basic principles of evolutionary theory24. Box 1 provides a schematic overview of experiments of this type. The strength of the experimental approach is that the phenotypic and molecular changes of RNA viral populations can be monitored in real time. More importantly, under an appropriate experimental setting, it is also possible to directly estimate changes in fitness, one of the main goals of modern population genetics. A panoply of experiments have highlighted both the mechanisms of RNA virus evolution and what RNA viruses can tell us about evolution in general.

What have experiments told us about viral evolution? Experimental studies have made a substantial contribution towards understanding the processes that govern RNA virus evolution25. The main findings of these experiments can be grouped into three general types. First, the molecular basis of adaptive evolution in viruses, including the occurrence and frequency of CONVERGENT EVOLUTION26,27, viral attenuation28 and compensatory mutations29. Second, the role of population bottlenecks and the accumulation of deleterious mutations, and how they affect fitness30,31,32,33. Finally, the importance of CLONAL INTERFERENCE34 and COMPLEMENTATION35 in determining rates of viral adaptation. In brief, these studies make eight conclusions36. First, there is extensive convergent and parallel evolution (both in genotype and in phenotype) across lineages replicating in the same host, perhaps reflecting the fact that relatively few sites are free to vary when genome sizes are small. Second, advantageous mutations that are fixed early on when viruses are challenged with new environments confer the largest fitness benefit. Third, phenotypic evolution tends to mirror the evolution of fitness increments, with large changes occurring early in new environments. Fourth, rates of nucleotide substitution remain approximately constant through time. Fifth, overall genetic diversity remains low during the phase of maximum fitness increase and rises once fitness becomes asymptotic. Sixth, evolutionary changes that increase fitness in one host often, although not always, reduce fitness in an alternative host. Seventh, population bottlenecks and spatial heterogeneity lead to an increase in unique nucleotide substitutions. Finally, severe reductions in population size can lead to the accumulation of deleterious mutations and consequent fitness losses.

What have experiments told as about evolution in general? Many fundamental questions in evolutionary biology have been addressed using RNA viruses as model experimental systems. One of the first questions addressed concerned a severe consequence of deleterious mutation accumulation known as MULLER'S RATCHET37. Studies have explored how clonally evolving RNA viruses prevent the excessive build-up of deleterious mutations in populations that are experiencing strong bottlenecks or small effective sizes38. One popular suggestion is that sexual reproduction has evolved in RNA viruses, because it allows them to escape Muller's ratchet when effective population sizes are small39,40,41. In this model, the accumulation of deleterious mutations is expected to be less of a problem for RNA viruses that either recombine or undergo reassortment in the case of viruses with segmented genomes. However, recombination rates are often so low in RNA viruses that it is difficult to hypothesize that they have a direct fitness benefit. In the face of low recombination rates, RNA viruses could escape Muller's ratchet because their long-term effective population sizes might not be small enough to allow deleterious mutations to accumulate, or perhaps compensatory mutations are sufficiently frequent to counteract fitness losses42.

Experimental evolution with RNA viruses has also been crucial for studying the dynamics of natural selection. This has been the case, for instance, in studies of COMPETITIVE EXCLUSION43, the RED QUEEN HYPOTHESIS43, convergent evolution26,27,44 and the rhythm of adaptive change45. We expect RNA viruses to continue to have an important role in this area for many years to come. Finally, one promising area involves using RNA viruses to test theories for the evolution of cooperation. For example, Turner and Chao conducted a series of experiments with bacteriophage φ6 in which they demonstrated that RNA viruses could evolve under PRISONER'S DILEMMA conditions46, and also escape from it47.

Despite the power of experiments, there are still difficulties in estimating some of the important parameters in evolutionary biology, such as the rate of deleterious mutation and fitness values. In recent years, a number of in silico approaches have been used to answer these questions, most notably with computer-generated genomes ('digital organisms') that are designed to behave as living systems. Digital organisms have the ability to create a copy of their own genome, but are subject to copying errors, so that populations of programs evolve in, and adapt to, their environments48. Although digital organisms are not as sophisticated as viruses, they are useful study tools because experiments can be easily controlled and repeated. Digital organisms have been useful in studying predicted adaptive evolution over short and long timescales49,50, the role of epistasis in evolution51 and testing key aspects of quasispecies theory, particularly whether fast replicators can be less fit than slower replicators at high-mutation rates52.

Molecular epidemiology of RNA viruses

Molecular epidemiology was first introduced to the study of infectious disease in the early 1970s53. Since this time, the analysis of gene sequence variation has become a standard practice for virologists with an interest in epidemiology, especially with the advent of new, high-throughput sequencing technologies. From the typing of viral populations to the study of the origins of a new virus, viral gene sequence variation has been used to answer a wide variety of questions, increasing both the quantity and quality of the epidemiological data available. In this section, we discuss how particular aspects of RNA virus evolution affect the reconstruction of their epidemiological history. In this context, it is important to note that the observed epidemiological patterns of viruses result from their evolution at two different levels: within individual hosts54 (and vectors55) and among hosts at the population level. RNA viruses differ greatly in their patterns and processes of intra- and inter-host evolution, as well as in the duration of the infection caused and the type of immune response that is induced. Such factors must be considered when discussing their epidemiological dynamics in a comparative setting3.

Inferring epidemiological history. Many aspects of the epidemiological history of viruses can be graphically summarized in a phylogenetic tree. The timescale of these trees can vary from a few weeks to many centuries, and depends on the rate of accumulation of variation in the sequences under study and the timescale of sampling. Although graphically very similar, there are important differences between phylogenetic trees and gene genealogies56. The former are used to analyse the evolutionary history of distinct viral species or genes, usually by sampling one representative of each unit under study. By contrast, genealogies depict the history of genetic polymorphisms segregating in contemporaneous populations. Gene genealogies have been used extensively in the study of RNA viruses because their rapid evolution means that sequence variation increases over very short periods of time. Indeed, RNA viruses constitute the most important class of measurably evolving populations57, with evolution even occurring during the infection of a single individual58. More accurate methods of evolutionary inference have already been designed for these rapidly evolving populations59,60,61. In particular, whereas phylogenetic trees generally depict strictly bifurcating patterns of relationships, gene genealogies can take into account recombination, which will lead to inter-connected networks of lineages. Moreover, using a statistical approach called coalescent theory62,63,64, it is possible to infer demographic processes from genetic polymorphism data, most notably, rates of viral population growth and decline. Coalescent methods can operate under several population genetic models and use gene genealogies as key analytical tools. Coalescent theory therefore provides a crucial conceptual link between phylogenetics and population genetics.

Both phylogenies and gene genealogies are relevant for the epidemiological analysis of RNA viruses and are useful for investigating the origin of new viruses65,66,67 or identifying the source of an outbreak68,69,70,71. Used in conjunction with population genetic theory, they are also essential for the identification of positive72,73, or purifying74, natural selection at nucleotide sites, the presence and extent of recombination13,75,76 and for dating important points in the history of epidemics67,77,78. Genealogical analysis is especially relevant for reconstructing the recent epidemiological history of viral populations66,79,80,81, such as in forensic studies (Box 2, Fig. 1), and therefore have important implications for determining public health policies.

Figure 1: Two alternative epidemiological scenarios translate into different phylogenetic tree topologies, the statistical support for which can be compared directly.
figure 1

The tree in panel a depicts a common and close origin for samples 1–3 (node A), which is separate from the control samples 4–7 (node B). Node A might correspond to a single outbreak or a suspected transmission among these patients, whereas node B includes samples suspected, but not related to, the outbreak (4–7) and unrelated population controls (8,9). Panel b represents the alternative proposal for sample 1, which is now separated from the former cluster and instead groups with the control samples. Similar proposals can be separately formulated for each of the samples 1–3.

Rates of evolutionary change in RNA viruses. Although it is fair to assume that frequent mutation means that long-term rates of nucleotide substitution are usually high in RNA viruses, in reality these rates might vary widely, both within and among genes in the same species and among viral species82. Indeed, present data indicate that viral substitution rates are much more variable than their underlying mutation rates5,6, which is most likely a reflection of important differences in replication dynamics. For example, the nucleotide substitution rate in human T-lymphotropic virus type II (HTLV-II) varies from 1 × 10−4 substitutions per site per year in epidemics with high rates of transmission and where replication is rapid, such as those in injecting drug users, to 1 × 10−7 substitutions per site per year in endemic situations, where viruses are maintained within hosts through the clonal expansion of infected cells rather than by active replication83. However, in many RNA viruses, substitution rates of 1 × 10−3 to 1 × 10−4 substitutions per site per year are observed82. The variation in substitution rates across viral genomes has benefits, because it allows different epidemiological questions to be addressed, relating to different temporal scales. So, rapidly evolving (hypervariable) gene regions are informative for studying viral evolution within individuals, or for identifying the source of a particular disease outbreak. More conserved regions are better suited for in-depth phylogenetic inference, from analysing viral genotypes at the species to family levels.

In a number of cases it has been proposed that molecular evolution within specific RNA virus genes proceeds at a constant rate. Such MOLECULAR CLOCKS have been proposed for human influenza A virus84, although the constancy of the evolutionary rate does not hold in many other cases82,85,86,87. Non-clocklike evolution can result from a number of evolutionary forces, such as changes in host species, changes in structural and functional constraints88,89, and the occurrence of positive selection. Although most modern methods of phylogenetic analysis incorporate such rate variation — so that it is unlikely to cause significant error in the reconstruction of tree topologies — it can have an important impact on the analysis of divergence times.

The emergence of RNA viruses

The past 25 years have seen the emergence of several RNA viruses,which are either new to medical science or have increased in prevalence to the extent that they are now a major concern for public health. Agents that fall into this category include HIV, hepatitis C virus and, most recently, severe acute respiratory syndrome coronavirus (SARS-CoV). Given the continuing threat that is posed by viral diseases, it is essential that we determine the factors underlying viral emergence.

The evolutionary genetics of viral emergence. Hosts acquire RNA viruses by two different mechanisms. First, owing to host–virus co-speciation, host populations might have carried a specific RNA virus for their entire evolutionary history. Although co-speciation has been proposed in some RNA viruses90, the process seems to be rare. This is most likely a result of the short infectious periods of many RNA viruses, so that they have limited opportunity for the sustained transmission that is probably needed for co-speciation. By contrast, many DNA viruses establish persistent infections and are therefore expected to be able to follow long-term patterns of host speciation91.

A more common method by which RNA viruses could enter new host species is through lateral transfer from different reservoir species. Both ecology and genetics seem to be important in determining whether a virus is able to successfully cross species boundaries. In many cases, ecological factors are the most important. Although such factors are diverse, and have been reviewed in detail elsewhere92,93, they usually reflect changes in either the proximity or density of the host and/or reservoir species, which increase the likelihood that humans are exposed to new pathogens and that sustained transmission networks will be established.

Far less is known about the possible genetic factors that might affect the ability of viruses to cross species boundaries. Although RNA viruses are the group of pathogens that seem most able to cross species boundaries94, perhaps because high mutation rates provide them with an increased capacity to adapt to new hosts95, not all RNA viruses are equally equipped in this respect. For example, in many cases, RNA viruses (such as rabies virus infection in humans) establish 'dead-end' infections in specific hosts, without subsequent transmission, which reflects imperfect adaptation. This indicates that there are constraints that inhibit viral adaptation to new hosts, perhaps owing to the fitness trade-offs that seem commonplace in viruses that need to infect different hosts or cell types25,96. Therefore, infecting different hosts is likely to represent a major adaptive challenge for RNA viruses, despite their mutation rate. Examples are animal vector-borne viruses, which are less subject to adaptive evolution than their non-vector-borne counterparts, presumably owing to the difficulties that are associated with simultaneous replication in hosts as divergent as invertebrates and vertebrates97. If extended over longer periods of time, this will lead to a phylogenetic rule of cross-species transmission, such that the greater the evolutionary distance between hosts, the lower the probability of viral transfer among them98.

A fundamental aspect of the mechanistic basis of viral emergence is the relationship between virus and host cell receptors99. Unless a virus has sequences that are able to recognize the cellular receptors of a potential host species, successful cross-species transmission will not be possible. Therefore, jumping species boundaries might only be a problem for an RNA virus if it has to adapt to different cellular receptors, although this still does not guarantee that sustained transmission will be established in the new host. An informative example is provided by influenza A virus. Birds are the main species reservoir, and avian influenza A viruses are usually unable to jump directly into humans because they lack the necessary mutations in the haemagglutinin (HA) gene to infect human cells100. Even when avian influenza A viruses do infect humans, human-to-human transmission might not be established. More generally, the relationship between the virus and the host cell receptor predicts an association between the number of cells a virus infects and its host range, thereby explaining whether a virus is a host 'specialist' or 'generalist'. Determining whether such a relationship exists should be a key goal in understanding the genetic basis of viral emergence.

Case studies in viral emergence. The complex interplay between ecology and genetics in viral emergence can be seen in HIV. An important ecological factor in HIV emergence involves the bushmeat trade in west Africa. Not only have a wide range of related simian immunodeficiency viruses (SIVs) been isolated from animal carcasses101, but the bushmeat trade has increased owing to encroachment by humans on the ranges occupied by non-human primates. The SIV that is found in chimpanzees (SIVcpz) is the closest relative, and therefore the most likely ancestor, of human HIV-1 (Ref. 102), whereas SIVsm from sooty mangabeys seems to be the reservoir population for HIV-2 (Ref. 103). For both HIV-1 and HIV-2, there have been multiple transfers of virus from their reservoirs into humans, with these viruses most likely establishing themselves in humans during the last century67,77. Also of importance, was the movement of individuals infected early in the epidemic from small, isolated rural populations to cities in Africa103, which enabled incipient epidemic strains to reach a large number of susceptible hosts. Yet, genetics is also likely to have been important in the emergence of HIV. In particular, phylogenetic studies indicate that SIVs are most easily transmissable among related primate hosts104, implying that not all possible instances of cross-species viral transmission that could occur do occur, and that adaptive constraints might exist.

A more recent and highly publicized example of viral emergence is provided by the SARS-CoV, the agent of a severe form of pneumonia that has killed more than 700 people worldwide since its appearance in China in November 2002. It is unclear whether the epidemic of 2002–2003 was the first appearance of SARS, or whether the virus had sporadically entered human populations previously, but without detrimental consequences. The animal reservoir for SARS-CoV is also a subject for debate. Phylogenetic analysis reveals that SARS-CoV is equidistant between coronavirus groups 1 and 2, which are usually isolated from mammalian species, and coronavirus group 3, which is currently confined to birds65,105,106 (Fig. 2). Moreover, the sequence divergence between these three groups and SARS-CoV is so large that SARS-CoV has clearly experienced a long period of independent evolution. Studies of animals sold at Chinese markets have detected antibodies in a number of mammalian species107. Most notably, viruses obtained from the Himalayan palm civet (a member of the Viverridae) are closely related to human strains of SARS-CoV. Whether this species represents the main reservoir for SARS-CoV is still unclear. Finally, there have also been suggestions that SARS-CoV is a recombinant of mammalian and avian coronaviruses and that this genetic event might have trigged viral emergence108. However, because the sequences involved are so divergent, the phylogenetic incongruence in trees of SARS-CoV seems more likely to be due to variation in the molecular clock than inter-coronavirus recombination.

Figure 2: The phylogenetic relationships of SARS coronavirus (SARS-CoV) inferred using sequences of the spike glycoprotein.
figure 2

a | Phylogenetic relationship of SARS-CoV to the known coronaviruses. Owing to the highly divergent nature of these viruses, the analysis was conducted using an alignment of 12 amino acid sequences that are 1,270 residues in length. The tree was inferred using the maximum likelihood (ML) method available in TREE-PUZZLE135. Numbers next to some branches represent quartet puzzling support values, which give an indication of the reliability of that branch. SARS-CoV appears as a distinct lineage. b | Magnified phylogeny of representative SARS-CoV strains isolated from humans and the Himalayan palm civet (Paguma larvata), a putative reservoir species. The tree was constructed using the same region as in part a but using nucleotide sequences (16 sequences, 3,765 bp). The tree was inferred using the ML method available in PAUP*136. Maximum-likelihood bootstrap values are shown for the main branches. Both trees are mid-point rooted and all horizontal branches are drawn to a scale of the number of substitutions per site (note the difference in scale between the two trees). All parameter settings used in the phylogenetic analysis are available from the authors on request. The following sequences were analysed (abbreviated viral names and GenBank accession numbers are given in parentheses); Group 1 coronaviruses: feline infectious peritonitis virus (FIPV; CAA29535); Group 2 coronaviruses: bovine coronavirus (BCoV; AF220295), human coronavirus OC43 (HCoV-OC43; S44241), murine hepatitis virus (MHV; AF029248, AF201929, AF208066, CAA28484), rat sialodacryoadenitis coronavirus (SDAV; AAF97738); porcine haemagglutinating encephalomyelitis virus (PHEV; AF481863); Group 3 coronaviruses: infectious bronchitis virus (IBV; AJ311317); SARS coronaviruses: Himalayan palm civet SARS-CoV, strains SZ1 (AY304489), SZ3 (AY304486), SZ13 (AY304487) and SZ16 (AY304488), and human SARS-CoV, strains Sin2677 (AY283795), BJ01 (AY278488), CUHK-AG01 (AY345986), GD01 (AY278489), GZ02 (AY390556), GZ50 (AY304495), HSZ-Bc (AY394994), PUMC02 (AY357075), Taiwan TC1 (AY338174), TW7 (AY502930), Urbani (AY278741) and ZS-C (AY395003).

Predicting viral emergence. The 'holy grail' for studies of emerging diseases is to predict which infectious agents are likely to infect human populations in the future. Although we are a long way from making accurate predictions, evolutionary genetics does allow some basic rules to be established, and phylogenetic methods have been used to successfully predict the future population survival of strains of influenza A virus109. First, the larger the population size of the reservoir species, the more viruses it can harbour, including those with shorter durations of infection and increased virulence110. Consequently, animal species that live at high densities, such as some bats, rodents and birds, are most likely to be reservoirs, particularly those animal populations that already live in close proximity to humans. Less intuitively, if there is a relationship between the breadth of cell tropism and the number of species infected, most attention should be given to those viruses that infect several cell types. More importantly, a comprehensive survey of RNA virus diversity should be undertaken in appropriate animal species. This can be done through the use of degenerate PCR primers that have been designed for several RNA virus families, followed by studies to determine whether the viruses will grow in human cells. Similar approaches have already uncovered a plethora of new virus families from marine environments111.

RNA virus evolution in the long term

One aim of studies of RNA virus evolution is to use our understanding of evolutionary processes in the short term, which have often been acquired from experiments, to predict what evolution will do in the long term. Although evolutionary biologists are rightly nervous about predicting future change, the rapid pace of RNA virus evolution means that these predictions can be tested quickly. Of most immediate interest are patterns of drug resistance and viral virulence.

The evolution of drug resistance. Understanding drug resistance is one area in which population biology has a direct impact on public health112,113. In the case of RNA viruses, most interest has focused on the potential of drugs to control HIV infection. Despite the optimism that initially surrounded the deployment of highly active antiretroviral therapy (HAART), which involves combinations of drugs114, antiviral therapy is unlikely to provide a cure for HIV. There are several reasons for this, not least of which is that despite our inability to detect viruses in some patients receiving HAART, viral replication is ongoing, although at greatly reduced levels115. Early studies predicting the length of time it would take for resistance to arise under multiple drug therapy also underestimated the importance of recombination in HIV, which we now know is extensive116. Frequent recombination could allow drug resistance to be acquired more rapidly than acquisition through mutation alone.

There are many factors that influence the evolution of drug resistance, and important results have been obtained — for example, regarding the probability of resistance mutations arising before and during treatment, and the optimal time for the onset of drug treatment117. One important question, which also relates to the mechanics of viral evolution in general, is whether drug-resistance mutations have a fitness cost compared with wild-type alleles in the absence of the drug. If there is a fitness cost, we would hypothesize that resistance mutations would not reach high frequencies in populations, despite their benefit to the virus in hosts. There is evidence that, in the absence of the drug, HIV strains harbouring drug-resistant mutations are less fit than wild-type HIV strains118. Unfortunately, in other cases, drug-resistant HIV mutants seem to have greater infectivity, and even replication capacity, than wild-type viruses119. Not surprisingly, these mutations are increasingly sampled from drug-naive patients120.

Even if drug mutations are universally advantageous, their long-term success depends on more than their individual fitness. An important mediating factor is the strength of genetic drift at the population level. If drift is strong, which will be the case if effective population sizes are small, the frequencies that mutations eventually attain in populations has a large stochastic component. For RNA viruses, effective population size reflects the mode of transmission. This can be shown by comparing HIV with influenza A virus3. In the case of the respiratory transmitted influenza A virus, the host population is large and extensively mixed. Consequently, advantageous mutations, most notably those that confer antigenic escape, are able to be fixed in the virus population in a regular manner121. By contrast, population-level selection seems to be considerably weaker in the sexually transmitted HIV, although some evidence for long-term cytotoxic T lymphocyte (CTL)-mediated selection has been found122. This contrasts with intra-host HIV evolution, in which immune-driven natural selection is the dominant evolutionary process72,123,124. The reduced impact of selection at the population level is most likely to be caused by extensive variation in rates of partner exchange, which in turn reduces the effective population size125, and because there is a large bottleneck at transmission126. Therefore, for HIV, intra-host and inter-host evolution seem to be largely decoupled. Although fewer studies have compared intra- and inter-host evolution in acute RNA virus infections, a recent analysis of dengue virus indicated that most amino acid changes that arise within hosts are deleterious in the long term74.

The evolution of viral virulence. The evolution of virulence has long been of interest to population biologists. A common view is that if virulence is a selected trait at all, then it is often involved in a trade-off with transmissibility; the balance of these factors that maximizes the BASIC REPRODUCTIVE RATE (R0) of the pathogen is favoured by natural selection127, although this has recently been questioned128. For RNA viruses it is therefore important to determine whether virulence is optimized and, if so, how it is linked to transmissibility. Complexities arise because virulence is also likely to vary according to the transmission mode129, and whether there is a long period of intra-host evolution, including superinfection by other strains, which increases intra-strain competition and therefore virulence130. In short, predictions about the long-term evolution of virulence in RNA viruses need to be made on a case-specific basis.

However, some aspects of the evolution of virulence reflect those that are associated with drug resistance. For example, if particular mutations confer virulence, then whether they become fixed in populations also depends on the strength of genetic drift, even if they are advantageous. Consequently, the optimal level of virulence might not be acquired by chance in small populations. Similarly, if the evolutionary process differs greatly within and among hosts, a selectively favoured level of virulence within hosts might be disadvantageous among hosts. The intra-host evolution of HIV tends to result in the production of high-virulence viral strains that preferentially use the CXCR4 chemokine receptor, infect cells faster and cause AIDS to develop more rapidly131. However, these strains seem to be transmitted less often, indicating that they are selectively disadvantageous in new hosts132. Understanding the interplay between virulence and transmissibility is clearly central to understanding the evolution of virulence of RNA virus in diseases.

Conclusions

Establishing the rules of RNA virus evolution is important: not only will this provide information that is essential for understanding the basic mechanisms of evolutionary change, but it will assist in the design of strategies for the control, treatment and eradication of RNA viruses, and perhaps for predicting their emergence. Although it is clear that RNA viruses are unique in the rapidity with which they mutate, their evolution cannot be described in full without a consideration of all the processes of evolutionary change. A particular challenge for the future is to determine whether viral evolution in nature is similar to that established in vitro. The beauty of RNA viruses is that the link between experimental and natural systems can be made simply — few other organisms are as well suited for studying evolutionary processes.