File Download
There are no files associated with this item.
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1016/j.cels.2024.01.002
- Scopus: eid_2-s2.0-85185824181
- PMID: 38340729
- WOS: WOS:001197724700001
- Find via

Supplementary
- Citations:
- Appears in Collections:
Article: Accurate top protein variant discovery via low-N pick-and-validate machine learning
| Title | Accurate top protein variant discovery via low-N pick-and-validate machine learning |
|---|---|
| Authors | |
| Keywords | active learning base editor Cas9 combinatorial mutagenesis CRISPR genome editing low-N machine learning protein engineering zero-shot |
| Issue Date | 21-Feb-2024 |
| Publisher | Elsevier |
| Citation | Cell Systems, 2024, v. 15, n. 2, p. 193-203 How to Cite? |
| Abstract | A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information. |
| Persistent Identifier | http://hdl.handle.net/10722/345914 |
| ISSN | 2023 Impact Factor: 9.0 2023 SCImago Journal Rankings: 4.872 |
| ISI Accession Number ID |
| DC Field | Value | Language |
|---|---|---|
| dc.contributor.author | Chu, Hoi Yee | - |
| dc.contributor.author | Fong, John HC | - |
| dc.contributor.author | Thean, Dawn GL | - |
| dc.contributor.author | Zhou, Peng | - |
| dc.contributor.author | Fung, Frederic KC | - |
| dc.contributor.author | Huang, Yuanhua | - |
| dc.contributor.author | Wong, Alan SL | - |
| dc.date.accessioned | 2024-09-04T07:06:26Z | - |
| dc.date.available | 2024-09-04T07:06:26Z | - |
| dc.date.issued | 2024-02-21 | - |
| dc.identifier.citation | Cell Systems, 2024, v. 15, n. 2, p. 193-203 | - |
| dc.identifier.issn | 2405-4712 | - |
| dc.identifier.uri | http://hdl.handle.net/10722/345914 | - |
| dc.description.abstract | A strategy to obtain the greatest number of best-performing variants with least amount of experimental effort over the vast combinatorial mutational landscape would have enormous utility in boosting resource producibility for protein engineering. Toward this goal, we present a simple and effective machine learning-based strategy that outperforms other state-of-the-art methods. Our strategy integrates zero-shot prediction and multi-round sampling to direct active learning via experimenting with only a few predicted top variants. We find that four rounds of low-N pick-and-validate sampling of 12 variants for machine learning yielded the best accuracy of up to 92.6% in selecting the true top 1% variants in combinatorial mutant libraries, whereas two rounds of 24 variants can also be used. We demonstrate our strategy in successfully discovering high-performance protein variants from diverse families including the CRISPR-based genome editors, supporting its generalizable application for solving protein engineering tasks. A record of this paper's transparent peer review process is included in the supplemental information. | - |
| dc.language | eng | - |
| dc.publisher | Elsevier | - |
| dc.relation.ispartof | Cell Systems | - |
| dc.subject | active learning | - |
| dc.subject | base editor | - |
| dc.subject | Cas9 | - |
| dc.subject | combinatorial mutagenesis | - |
| dc.subject | CRISPR | - |
| dc.subject | genome editing | - |
| dc.subject | low-N | - |
| dc.subject | machine learning | - |
| dc.subject | protein engineering | - |
| dc.subject | zero-shot | - |
| dc.title | Accurate top protein variant discovery via low-N pick-and-validate machine learning | - |
| dc.type | Article | - |
| dc.identifier.doi | 10.1016/j.cels.2024.01.002 | - |
| dc.identifier.pmid | 38340729 | - |
| dc.identifier.scopus | eid_2-s2.0-85185824181 | - |
| dc.identifier.volume | 15 | - |
| dc.identifier.issue | 2 | - |
| dc.identifier.spage | 193 | - |
| dc.identifier.epage | 203 | - |
| dc.identifier.eissn | 2405-4720 | - |
| dc.identifier.isi | WOS:001197724700001 | - |
| dc.identifier.issnl | 2405-4712 | - |
