Detecting natural selection in RNA virus populations using sequence summary statistics

Samir Bhatt; Aris Katzourakis; Oliver G Pybus

doi:10.1016/j.meegid.2009.06.001

Detecting natural selection in RNA virus populations using sequence summary statistics

Infect Genet Evol. 2010 Apr;10(3):421-30. doi: 10.1016/j.meegid.2009.06.001. Epub 2009 Jun 11.

Authors

Samir Bhatt¹, Aris Katzourakis, Oliver G Pybus

Affiliation

¹ Department of Zoology, University of Oxford, United Kingdom.

PMID: 19524068
DOI: 10.1016/j.meegid.2009.06.001

Abstract

At present, most analyses that aim to detect the action of natural selection upon viral gene sequences use phylogenetic estimates of the ratio of silent to replacement mutations. Such methods, however, are impractical to compute on large data sets comprising hundreds of complete viral genomes, which are becoming increasingly common due to advances in genome sequencing technology. Here we investigate the statistical performance of computationally efficient tests that are based on sequence summary statistics, and explore their applicability to RNA virus data sets in two ways. Firstly, we perform extensive simulations in order to measure the type I error of two well-known summary statistic methods - Tajima's D and the McDonald-Kreitman test - under a range of virus-like mutational and demographic scenarios. Secondly, we apply these methods to a compilation of approximately 100 RNA virus alignments that represent natural RNA virus populations. In addition, we develop and introduce a new implementation of the McDonald-Kreitman test and show that it greatly improves the test's statistical reliability on typical viral data sets. Our results suggest that variants of the McDonald-Kreitman test could prove useful in the analysis of very large sets of highly diverse viral genetic data.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Evolution, Molecular
Genetic Variation
Genome, Viral*
Phylogeny
RNA Viruses / genetics*
RNA, Viral / analysis
RNA, Viral / genetics
Selection, Genetic*
Sequence Alignment
Sequence Analysis, RNA
Species Specificity
Statistics as Topic*

Substances

RNA, Viral