File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Asymptotic Distribution-Free Independence Test for High Dimension Data

TitleAsymptotic Distribution-Free Independence Test for High Dimension Data
Authors
Issue Date26-May-2023
PublisherTaylor and Francis Group
Citation
Journal of the American Statistical Association, 2023 How to Cite?
Abstract

Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is sparse, independence testing becomes very challenging without distributional or structural assumptions. In this paper, we propose a general framework for independence testing by first fitting a classifier that distinguishes the joint and product distributions, and then testing the significance of the fitted classifier. This framework allows us to borrow the strength of the most advanced classification algorithms developed from the modern machine learning community, making it applicable to high dimensional, complex data. By combining a sample split and a fixed permutation, our test statistic has a universal, fixed Gaussian null distribution that is independent of the underlying data distribution. Extensive simulations demonstrate the advantages of the newly proposed test compared with existing methods. We further apply the new test to a single cell data set to test the independence between two types of single cell sequencing measurements, whose high dimensionality and sparsity make existing methods hard to apply.


Persistent Identifierhttp://hdl.handle.net/10722/331012
ISSN
2023 Impact Factor: 3.0
2023 SCImago Journal Rankings: 3.922

 

DC FieldValueLanguage
dc.contributor.authorCai, Zhanrui-
dc.contributor.authorLei, Jing-
dc.contributor.authorRoeder, Kathryn-
dc.date.accessioned2023-09-21T06:51:58Z-
dc.date.available2023-09-21T06:51:58Z-
dc.date.issued2023-05-26-
dc.identifier.citationJournal of the American Statistical Association, 2023-
dc.identifier.issn0162-1459-
dc.identifier.urihttp://hdl.handle.net/10722/331012-
dc.description.abstract<p>Test of independence is of fundamental importance in modern data analysis, with broad applications in variable selection, graphical models, and causal inference. When the data is high dimensional and the potential dependence signal is sparse, independence testing becomes very challenging without distributional or structural assumptions. In this paper, we propose a general framework for independence testing by first fitting a classifier that distinguishes the joint and product distributions, and then testing the significance of the fitted classifier. This framework allows us to borrow the strength of the most advanced classification algorithms developed from the modern machine learning community, making it applicable to high dimensional, complex data. By combining a sample split and a fixed permutation, our test statistic has a universal, fixed Gaussian null distribution that is independent of the underlying data distribution. Extensive simulations demonstrate the advantages of the newly proposed test compared with existing methods. We further apply the new test to a single cell data set to test the independence between two types of single cell sequencing measurements, whose high dimensionality and sparsity make existing methods hard to apply.<br></p>-
dc.languageeng-
dc.publisherTaylor and Francis Group-
dc.relation.ispartofJournal of the American Statistical Association-
dc.titleAsymptotic Distribution-Free Independence Test for High Dimension Data-
dc.typeArticle-
dc.identifier.doi10.1080/01621459.2023.2218030-
dc.identifier.eissn1537-274X-
dc.identifier.issnl0162-1459-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats