Abstract
Predicting the molecular complexity of a genomic sequencing library is a critical but difficult problem in modern sequencing applications. Methods to determine how deeply to sequence to achieve complete coverage or to predict the benefits of additional sequencing are lacking. We introduce an empirical Bayesian method to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Lander, E. & Waterman, M. Genomics 2, 231–239 (1988).
Chen, Y. et al. Nat. Methods 9, 609–614 (2012).
Fisher, R.A., Corbet, S. & Williams, C.B. J. Anim. Ecol. 12, 42–58 (1943).
Good, I.J. & Toulmin, G.H. Biometrika 43, 45–63 (1956).
Kivioja, T. et al. Nat. Methods 9, 72–74 (2012).
Efron, B. & Thisted, R. Biometrika 63, 435–447 (1976).
Baker, G. & Graves-Morris, P. Pade Approximants (Cambrige University Press, Cambridge, UK, 1996).
Molaro, A. et al. Cell 146, 1029–1041 (2011).
Ribeiro de Almeida, C. et al. Immunity 35, 501–513 (2011).
Lister, R. et al. Nature 471, 68–73 (2011).
Link, W. Biometrics 59, 1123–1130 (2003).
Mao, C. & Lindsay, B. Ann. Stat. 35, 917–930 (2007).
Keating, K., Quinn, J., Ivie, M. & Ivie, L. Ecol. Appl. 8, 1239–1249 (1998).
Hardy, G. Divergent series (Oxford University Press, London, 1949).
Simon, B. Adv. Math. 137, 82–203 (1998).
McCabe, J.H. Math. Comput. 41, 183–197 (1983).
Blanch, G. SIAM Rev. 6, 383–421 (1964).
Acknowledgements
We thank S. Tavaré, M. Waterman, P. Calabrese, G. Hannon, and members of the Hannon lab and the Smith lab for their help, advice and input. This work was supported by US National Institutes of Health National Human Genome Research Institute grants (R01 HG005238 and P50 HG002790).
Author information
Authors and Affiliations
Contributions
T.D. and A.D.S. designed the method, implemented the software, performed the analysis and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Note, Supplementary Figures 1–2, Supplementary Tables 2–3 (PDF 4640 kb)
Supplementary Table 1
Properties of data sets used in evaluating estimates of library complexity. (XLSX 50 kb)
Supplementary Software
Preseq source code and manual. (ZIP 165 kb)
Rights and permissions
About this article
Cite this article
Daley, T., Smith, A. Predicting the molecular complexity of sequencing libraries. Nat Methods 10, 325–327 (2013). https://doi.org/10.1038/nmeth.2375
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.2375
This article is cited by
-
Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet
Genome Biology (2024)
-
Detection of DNA methylation signatures through the lens of genomic imprinting
Scientific Reports (2024)
-
Impaired ATF3 signaling involves SNAP25 in SOD1 mutant ALS patients
Scientific Reports (2023)
-
Targeted deletion of von-Hippel-Lindau in the proximal tubule conditions the kidney against early diabetic kidney disease
Cell Death & Disease (2023)
-
Ancient DNA reveals admixture history and endogamy in the prehistoric Aegean
Nature Ecology & Evolution (2023)