File Download

There are no files associated with this item.

  Links for fulltext
     (May Require Subscription)
Supplementary

Article: Robust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT

TitleRobust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT
Authors
Keywordsdeep generative models
mosaic integration
multiomics
transfer learning
Issue Date2022
Citation
Proceedings of the National Academy of Sciences of the United States of America, 2022, v. 119, n. 49, article no. e2214414119 How to Cite?
AbstractRecent advances in single-cell technologies enable joint profiling of multiple omics. These profiles can reveal the complex interplay of different regulatory layers in single cells; still, new challenges arise when integrating datasets with some features shared across experiments and others exclusive to a single source; combining information across these sources is called mosaic integration. The difficulties lie in imputing missing molecular layers to build a self-consistent atlas, finding a common latent space, and transferring learning to new data sources robustly. Existing mosaic integration approaches based on matrix factorization cannot efficiently adapt to nonlinear embeddings for the latent cell space and are not designed for accurate imputation of missing molecular layers. By contrast, we propose a probabilistic variational autoencoder model, scVAEIT, to integrate and impute multimodal datasets with mosaic measurements. A key advance is the use of a missing mask for learning the conditional distribution of unobserved modalities and features, which makes scVAEIT flexible to combine different panels of measurements from multimodal datasets accurately and in an end-to-end manner. Imputing the masked features serves as a supervised learning procedure while preventing overfitting by regularization. Focusing on gene expression, protein abundance, and chromatin accessibility, we validate that scVAEIT robustly imputes the missing modalities and features of cells biologically different from the training data. scVAEIT also adjusts for batch effects while maintaining the biological variation, which provides better latent representations for the integrated datasets. We demonstrate that scVAEIT significantly improves integration and imputation across unseen cell types, different technologies, and different tissues.
Persistent Identifierhttp://hdl.handle.net/10722/328843
ISSN
2023 Impact Factor: 9.4
2023 SCImago Journal Rankings: 3.737
ISI Accession Number ID

 

DC FieldValueLanguage
dc.contributor.authorDu, Jin Hong-
dc.contributor.authorCai, Zhanrui-
dc.contributor.authorRoeder, Kathryn-
dc.date.accessioned2023-07-22T06:24:33Z-
dc.date.available2023-07-22T06:24:33Z-
dc.date.issued2022-
dc.identifier.citationProceedings of the National Academy of Sciences of the United States of America, 2022, v. 119, n. 49, article no. e2214414119-
dc.identifier.issn0027-8424-
dc.identifier.urihttp://hdl.handle.net/10722/328843-
dc.description.abstractRecent advances in single-cell technologies enable joint profiling of multiple omics. These profiles can reveal the complex interplay of different regulatory layers in single cells; still, new challenges arise when integrating datasets with some features shared across experiments and others exclusive to a single source; combining information across these sources is called mosaic integration. The difficulties lie in imputing missing molecular layers to build a self-consistent atlas, finding a common latent space, and transferring learning to new data sources robustly. Existing mosaic integration approaches based on matrix factorization cannot efficiently adapt to nonlinear embeddings for the latent cell space and are not designed for accurate imputation of missing molecular layers. By contrast, we propose a probabilistic variational autoencoder model, scVAEIT, to integrate and impute multimodal datasets with mosaic measurements. A key advance is the use of a missing mask for learning the conditional distribution of unobserved modalities and features, which makes scVAEIT flexible to combine different panels of measurements from multimodal datasets accurately and in an end-to-end manner. Imputing the masked features serves as a supervised learning procedure while preventing overfitting by regularization. Focusing on gene expression, protein abundance, and chromatin accessibility, we validate that scVAEIT robustly imputes the missing modalities and features of cells biologically different from the training data. scVAEIT also adjusts for batch effects while maintaining the biological variation, which provides better latent representations for the integrated datasets. We demonstrate that scVAEIT significantly improves integration and imputation across unseen cell types, different technologies, and different tissues.-
dc.languageeng-
dc.relation.ispartofProceedings of the National Academy of Sciences of the United States of America-
dc.subjectdeep generative models-
dc.subjectmosaic integration-
dc.subjectmultiomics-
dc.subjecttransfer learning-
dc.titleRobust probabilistic modeling for single-cell multimodal mosaic integration and imputation via scVAEIT-
dc.typeArticle-
dc.description.naturelink_to_subscribed_fulltext-
dc.identifier.doi10.1073/pnas.2214414119-
dc.identifier.pmid36459654-
dc.identifier.scopuseid_2-s2.0-85143464002-
dc.identifier.volume119-
dc.identifier.issue49-
dc.identifier.spagearticle no. e2214414119-
dc.identifier.epagearticle no. e2214414119-
dc.identifier.eissn1091-6490-
dc.identifier.isiWOS:001036941500003-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats