FastPval: A fast and memory efficient program to calculate very low P-values from empirical distribution

Li, MJ; Sham, PC; Wang, J

File Download

btq540.pdf

Links for fulltext

(May Require Subscription)

Publisher Website: 10.1093/bioinformatics/btq540
Scopus: eid_2-s2.0-78149251209
PMID: 20861029
WOS: WOS:000283919800014
Find via

Supplementary

Bookmarks:
- CiteULike: 8
Citations:
- Scopus: 0
- Web of Science: 0
- PubMed Central: 0
Appears in Collections:
- Psychiatry: Journal/Magazine Articles
- Biochemistry: Journal/Magazine Articles

Article: FastPval: A fast and memory efficient program to calculate very low P-values from empirical distribution

Title

FastPval: A fast and memory efficient program to calculate very low P-values from empirical distribution

Authors

Li, MJ Sham, PC Wang, J

Issue Date

2010

Publisher

Oxford University Press. The Journal's web site is located at http://bioinformatics.oxfordjournals.org/

Citation

Bioinformatics, 2010, v. 26 n. 22, p. 2897-2899 How to Cite?

DOI: http://dx.doi.org/10.1093/bioinformatics/btq540

Abstract

Motivation: Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster. Results: We have developed a multiple stage P-value calculating program called FastPval that can efficiently calculate very low (up to 10-9) P-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute P-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 109 resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach. © The Author(s) 2010. Published by Oxford University Press.

Persistent Identifier

http://hdl.handle.net/10722/137123

ISSN

1367-4803

2021 Impact Factor: 6.931

2020 SCImago Journal Rankings: 3.599

PubMed Central ID

PMC2971576

ISI Accession Number ID

WOS:000283919800014

Funding Agency	Grant Number
CRCG
Genomic SRT of the University of Hong Kong
Research Grants Council of Hong Kong	GRF 778609M AoE M-04/04

Funding Information:

Internal funds from the CRCG and the Genomic SRT of the University of Hong Kong; GRF 778609M and AoE M-04/04 from the Research Grants Council of Hong Kong.

References

References in Scopus

DC Field	Value	Language
dc.contributor.author	Li, MJ	en_HK
dc.contributor.author	Sham, PC	en_HK
dc.contributor.author	Wang, J	en_HK
dc.date.accessioned	2011-08-22T08:34:37Z	-
dc.date.available	2011-08-22T08:34:37Z	-
dc.date.issued	2010	en_HK
dc.identifier.citation	Bioinformatics, 2010, v. 26 n. 22, p. 2897-2899	en_HK
dc.identifier.issn	1367-4803	en_HK
dc.identifier.uri	http://hdl.handle.net/10722/137123	-
dc.description.abstract	Motivation: Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster. Results: We have developed a multiple stage P-value calculating program called FastPval that can efficiently calculate very low (up to 10-9) P-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute P-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 109 resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach. © The Author(s) 2010. Published by Oxford University Press.	en_HK
dc.language	eng	-
dc.publisher	Oxford University Press. The Journal's web site is located at http://bioinformatics.oxfordjournals.org/	en_HK
dc.relation.ispartof	Bioinformatics	en_HK
dc.subject.mesh	Computational Biology - methods	-
dc.subject.mesh	Databases, Factual	-
dc.subject.mesh	Models, Statistical	-
dc.subject.mesh	Software	-
dc.title	FastPval: A fast and memory efficient program to calculate very low P-values from empirical distribution	en_HK
dc.type	Article	en_HK
dc.identifier.openurl	http://library.hku.hk:4550/resserv?sid=HKU:IR&issn=1367-4803&volume=26&issue=22&spage=2897&epage=2899&date=2010&atitle=FastPval:+a+fast+and+memory+efficient+program+to+calculate+very+low+P-values+from+empirical+distribution	-
dc.identifier.email	Sham, PC: pcsham@hku.hk	en_HK
dc.identifier.email	Wang, J: junwen@hku.hk	en_HK
dc.identifier.authority	Sham, PC=rp00459	en_HK
dc.identifier.authority	Wang, J=rp00280	en_HK
dc.description.nature	published_or_final_version	en_US
dc.identifier.doi	10.1093/bioinformatics/btq540	en_HK
dc.identifier.pmid	20861029	-
dc.identifier.pmcid	PMC2971576	-
dc.identifier.scopus	eid_2-s2.0-78149251209	en_HK
dc.identifier.hkuros	189643	-
dc.identifier.hkuros	192075	en_US
dc.relation.references	http://www.scopus.com/mlt/select.url?eid=2-s2.0-78149251209&selection=ref&src=s&origin=recordpage	en_HK
dc.identifier.volume	26	en_HK
dc.identifier.issue	22	en_HK
dc.identifier.spage	2897	en_HK
dc.identifier.epage	2899	en_HK
dc.identifier.eissn	1460-2059	-
dc.identifier.isi	WOS:000283919800014	-
dc.publisher.place	United Kingdom	en_HK
dc.identifier.scopusauthorid	Li, MJ=37016520600	en_HK
dc.identifier.scopusauthorid	Sham, PC=34573429300	en_HK
dc.identifier.scopusauthorid	Wang, J=8950599500	en_HK
dc.identifier.citeulike	7911914	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

Article: FastPval: A fast and memory efficient program to calculate very low P-values from empirical distribution

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats