A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing

Zhang, Tian Hao; Wu, Nicholas C.; Sun, Ren

File Download

content.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1186/s12864-016-2388-9
Scopus: eid_2-s2.0-84957868538
PMID: 26868371
WOS: WOS:000370015400001

Supplementary

Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Biomedical Sciences: Journal/Magazine Articles
- President's Office: Journal/Magazine Articles

Article: A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing

Title	A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing
Authors	Zhang, Tian Hao Wu, Nicholas C.Sun, Ren
Keywords	Read-pairing Error-correction Error rate Deep sequencing Amplicon sequencing Tag-clustering
Issue Date	2016
Citation	BMC Genomics, 2016, v. 17, n. 1, article no. 108 How to Cite? DOI: http://dx.doi.org/10.1186/s12864-016-2388-9
Abstract	© 2016 Zhang et al. Background: The high error rate of next generation sequencing (NGS) restricts some of its applications, such as monitoring virus mutations and detecting rare mutations in tumors. There are two commonly employed sequencing library preparation strategies to improve sequencing accuracy by correcting sequencing errors: read-pairing method and tag-clustering method (i.e. primer ID or UID). Here, we constructed a homogeneous library from a single clone, and compared the variant calling accuracy of these error-correction methods. Result: We comprehensively described the strengths and pitfalls of these methods. We found that both read-pairing and tag-clustering methods significantly decreased sequencing error rate. While the read-pairing method was more effective than the tag-clustering method at correcting insertion and deletion errors, it was not as effective as the tag-clustering method at correcting substitution errors. In addition, we observed that when the read quality was poor, the tag-clustering method led to huge coverage loss. We also tested the effect of applying quality score filtering to the error-correction methods and demonstrated that quality score filtering was able to impose a minor, yet statistically significant improvement to the error-correction methods tested in this study. Conclusion: Our study provides a benchmark for researchers to select suitable error-correction methods based on the goal of the experiment by balancing the trade-off between sequencing cost (i.e. sequencing coverage requirement) and detection sensitivity.
Persistent Identifier	http://hdl.handle.net/10722/285940
PubMed Central ID	PMC4751728
ISI Accession Number ID	WOS:000370015400001

DC Field	Value	Language
dc.contributor.author	Zhang, Tian Hao	-
dc.contributor.author	Wu, Nicholas C.	-
dc.contributor.author	Sun, Ren	-
dc.date.accessioned	2020-08-18T04:57:02Z	-
dc.date.available	2020-08-18T04:57:02Z	-
dc.date.issued	2016	-
dc.identifier.citation	BMC Genomics, 2016, v. 17, n. 1, article no. 108	-
dc.identifier.uri	http://hdl.handle.net/10722/285940	-
dc.description.abstract	© 2016 Zhang et al. Background: The high error rate of next generation sequencing (NGS) restricts some of its applications, such as monitoring virus mutations and detecting rare mutations in tumors. There are two commonly employed sequencing library preparation strategies to improve sequencing accuracy by correcting sequencing errors: read-pairing method and tag-clustering method (i.e. primer ID or UID). Here, we constructed a homogeneous library from a single clone, and compared the variant calling accuracy of these error-correction methods. Result: We comprehensively described the strengths and pitfalls of these methods. We found that both read-pairing and tag-clustering methods significantly decreased sequencing error rate. While the read-pairing method was more effective than the tag-clustering method at correcting insertion and deletion errors, it was not as effective as the tag-clustering method at correcting substitution errors. In addition, we observed that when the read quality was poor, the tag-clustering method led to huge coverage loss. We also tested the effect of applying quality score filtering to the error-correction methods and demonstrated that quality score filtering was able to impose a minor, yet statistically significant improvement to the error-correction methods tested in this study. Conclusion: Our study provides a benchmark for researchers to select suitable error-correction methods based on the goal of the experiment by balancing the trade-off between sequencing cost (i.e. sequencing coverage requirement) and detection sensitivity.	-
dc.language	eng	-
dc.relation.ispartof	BMC Genomics	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject	Read-pairing	-
dc.subject	Error-correction	-
dc.subject	Error rate	-
dc.subject	Deep sequencing	-
dc.subject	Amplicon sequencing	-
dc.subject	Tag-clustering	-
dc.title	A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing	-
dc.type	Article	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.1186/s12864-016-2388-9	-
dc.identifier.pmid	26868371	-
dc.identifier.pmcid	PMC4751728	-
dc.identifier.scopus	eid_2-s2.0-84957868538	-
dc.identifier.volume	17	-
dc.identifier.issue	1	-
dc.identifier.spage	article no. 108	-
dc.identifier.epage	article no. 108	-
dc.identifier.eissn	1471-2164	-
dc.identifier.isi	WOS:000370015400001	-
dc.identifier.issnl	1471-2164	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: A benchmark study on error-correction by read-pairing and tag-clustering in amplicon-based deep sequencing

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats