PCR duplicate proportion estimation and consequences for DNA copy number calculations
Abstract
The volume of DNA in a sequencing experiment is often amplified by PCR, leading to the possibility that the same original DNA fragment will be sequenced twice - a ‘PCR duplicate’. Sometimes indistinguishable from these are multiple sequences arising from identical but independent molecules, which can lead to an over-estimation of the PCR duplicate proportion. The PCR duplicate proportion, and other measures derived from it, are important statistics for quality assurance, experimental design, and interpretation of sequencing experiments. Here we provide a full likelihood basis for a combinatorial approach using heterozygous SNPs as implemented in our R package, and demonstrate the efficacy of the approach. We also discuss the association with DNA copy number, and demonstrate the impact on a question of inferring mitochondrial DNA copy number that has recently been a feature of several high-profile cancer studies. This is explored through a simulation study.
Citation
Lynch , A , Smith , M , Eldridge , M & Tavaré , S 2022 , PCR duplicate proportion estimation and consequences for DNA copy number calculations . in R Bispo , L Henriques-Rodrigues , R Alpizar-Jara & M de Carvalho (eds) , Recent developments in statistics and data science : SPE2021, Évora, Portugal, October 13–16 . vol. 398 , Springer proceedings in mathematics & statistics , vol. 398 , Springer , Cham , pp. 259-279 , XXV Congress of the Portuguese Statistical Society , Évora , Portugal , 13/10/21 . https://doi.org/10.1007/978-3-031-12766-3_18 conference
Publication
Recent developments in statistics and data science
ISSN
2194-1009Type
Conference item
Collections
Items in the St Andrews Research Repository are protected by copyright, with all rights reserved, unless otherwise indicated.