Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

Li, MX; Yeung, JMY; Cherny, SS; Sham, PC

File Download

fulltext.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1007/s00439-011-1118-2
Scopus: eid_2-s2.0-84862260334
WOS: WOS:000302816700010
Find via

Supplementary

Bookmarks:
- CiteULike: 1
Citations:
- Scopus: 0
- Web of Science: 0
Appears in Collections:
- Springer Open Choice
- Psychiatry: Journal/Magazine Articles

Article: Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

Title

Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

Authors

Li, MX Yeung, JMY Cherny, SS Sham, PC

Keywords

Biomedicine
Human Genetics
Molecular Medicine
Internal Medicine
Metabolic Diseases

Issue Date

2012

Publisher

Springer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00439/index.htm

Citation

Human Genetics, 2012, v. 131 n. 5, p. 747-756 How to Cite?

DOI: http://dx.doi.org/10.1007/s00439-011-1118-2

Abstract

Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (M e) for the adjustment of multiple testing, but current methods of calculation for M e are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate Me. Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the Me, and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ∼10 -7 as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ∼5 × 10 -8 for current or merged commercial genotyping arrays, ∼10 -8 for all common SNPs in the 1000 Genomes Project dataset and ∼5 × 10 -8 for the common SNPs only within genes. © The Author(s) 2011.

Persistent Identifier

http://hdl.handle.net/10722/147134

ISSN

0340-6717

2021 Impact Factor: 5.881

2020 SCImago Journal Rankings: 2.351

ISI Accession Number ID

WOS:000302816700010

Funding Agency	Grant Number
HKU	7672/06 M 201007176166
European Community	HEALTH-F2-2010-241909
University of Hong Kong Strategic Research Theme on Genomics

Funding Information:

We thank HapMap Project for the LD data and 1000 Genomes Project for the genotype data used in this project. This work was funded HKU 7672/06 M, the Small Project Funding HKU 201007176166, the European Community's Seventh Framework Programme under grant agreement No. HEALTH-F2-2010-241909 and The University of Hong Kong Strategic Research Theme on Genomics.

References

References in Scopus

Grants

Development of a bioinformatics tool to optimize the experimental design of targeted next-generation sequencing studies

DC Field	Value	Language
dc.contributor.author	Li, MX	en_HK
dc.contributor.author	Yeung, JMY	en_HK
dc.contributor.author	Cherny, SS	en_HK
dc.contributor.author	Sham, PC	en_HK
dc.date.accessioned	2012-05-28T08:20:27Z	-
dc.date.available	2012-05-28T08:20:27Z	-
dc.date.issued	2012	en_HK
dc.identifier.citation	Human Genetics, 2012, v. 131 n. 5, p. 747-756	en_HK
dc.identifier.issn	0340-6717	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/147134	-
dc.description.abstract	Current genome-wide association studies (GWAS) use commercial genotyping microarrays that can assay over a million single nucleotide polymorphisms (SNPs). The number of SNPs is further boosted by advanced statistical genotype-imputation algorithms and large SNP databases for reference human populations. The testing of a huge number of SNPs needs to be taken into account in the interpretation of statistical significance in such genome-wide studies, but this is complicated by the non-independence of SNPs because of linkage disequilibrium (LD). Several previous groups have proposed the use of the effective number of independent markers (M e) for the adjustment of multiple testing, but current methods of calculation for M e are limited in accuracy or computational speed. Here, we report a more robust and fast method to calculate Me. Applying this efficient method [implemented in a free software tool named Genetic type 1 error calculator (GEC)], we systematically examined the Me, and the corresponding p-value thresholds required to control the genome-wide type 1 error rate at 0.05, for 13 Illumina or Affymetrix genotyping arrays, as well as for HapMap Project and 1000 Genomes Project datasets which are widely used in genotype imputation as reference panels. Our results suggested the use of a p-value threshold of ∼10 -7 as the criterion for genome-wide significance for early commercial genotyping arrays, but slightly more stringent p-value thresholds ∼5 × 10 -8 for current or merged commercial genotyping arrays, ∼10 -8 for all common SNPs in the 1000 Genomes Project dataset and ∼5 × 10 -8 for the common SNPs only within genes. © The Author(s) 2011.	en_HK
dc.language	eng	en_US
dc.publisher	Springer Verlag. The Journal's web site is located at http://link.springer.de/link/service/journals/00439/index.htm	en_HK
dc.relation.ispartof	Human Genetics	en_HK
dc.rights	The Author(s)	en_US
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	en_US
dc.subject	Biomedicine	en_US
dc.subject	Human Genetics	en_US
dc.subject	Molecular Medicine	en_US
dc.subject	Internal Medicine	en_US
dc.subject	Metabolic Diseases	en_US
dc.title	Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets	en_HK
dc.type	Article	en_HK
dc.identifier.openurl	http://www.springerlink.com/link-out/?id=2104&code=XU848J7775R2R755&MUD=MP	en_US
dc.identifier.email	Cherny, SS: cherny@hku.hk	en_HK
dc.identifier.email	Sham, PC: pcsham@hku.hk	en_HK
dc.identifier.authority	Cherny, SS=rp00232	en_HK
dc.identifier.authority	Sham, PC=rp00459	en_HK
dc.description.nature	published_or_final_version	en_US
dc.identifier.doi	10.1007/s00439-011-1118-2	en_HK
dc.identifier.scopus	eid_2-s2.0-84862260334	en_HK
dc.identifier.hkuros	202751	-
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-84862260334&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	131	en_HK
dc.identifier.issue	5	en_HK
dc.identifier.spage	747	en_HK
dc.identifier.epage	756	en_HK
dc.identifier.eissn	1432-1203	en_US
dc.identifier.isi	WOS:000302816700010	-
dc.publisher.place	Germany	en_HK
dc.description.other	Springer Open Choice, 28 May 2012	en_US
dc.relation.project	Development of a bioinformatics tool to optimize the experimental design of targeted next-generation sequencing studies	-
dc.identifier.scopusauthorid	Li, MX=35205389900	en_HK
dc.identifier.scopusauthorid	Yeung, JMY=36818580500	en_HK
dc.identifier.scopusauthorid	Cherny, SS=7004670001	en_HK
dc.identifier.scopusauthorid	Sham, PC=34573429300	en_HK
dc.identifier.citeulike	10118284	-
dc.identifier.issnl	0340-6717	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats