Josh Tycko

Systematic discovery of protein functions in human cells to understand gene regulation and enable genetic medicine

Prediction and design of transcriptional repressor domains with large-scale mutational scans and deep learning


Journal article


Raeline Valbuena*, AkshatKumar Nigam*, Josh Tycko, Peter Suzuki, Kaitlyn Spees, Aradhana, Sophia Arana, Peter Du, Roshni A. Patel, Lacramiora Bintu, Anshul Kundaje, Michael Bassik
bioRxiv, 2024

Semantic Scholar DOI PubMed
Cite

Cite

APA   Click to copy
Valbuena*, R., Nigam*, A. K., Tycko, J., Suzuki, P., Spees, K., Aradhana, … Bassik, M. (2024). Prediction and design of transcriptional repressor domains with large-scale mutational scans and deep learning. BioRxiv.


Chicago/Turabian   Click to copy
Valbuena*, Raeline, AkshatKumar Nigam*, Josh Tycko, Peter Suzuki, Kaitlyn Spees, Aradhana, Sophia Arana, et al. “Prediction and Design of Transcriptional Repressor Domains with Large-Scale Mutational Scans and Deep Learning.” bioRxiv (2024).


MLA   Click to copy
Valbuena*, Raeline, et al. “Prediction and Design of Transcriptional Repressor Domains with Large-Scale Mutational Scans and Deep Learning.” BioRxiv, 2024.


BibTeX   Click to copy

@article{raeline2024a,
  title = {Prediction and design of transcriptional repressor domains with large-scale mutational scans and deep learning},
  year = {2024},
  journal = {bioRxiv},
  author = {Valbuena*, Raeline and Nigam*, AkshatKumar and Tycko, Josh and Suzuki, Peter and Spees, Kaitlyn and Aradhana and Arana, Sophia and Du, Peter and Patel, Roshni A. and Bintu, Lacramiora and Kundaje, Anshul and Bassik, Michael}
}

Abstract

Regulatory proteins have evolved diverse repressor domains (RDs) to enable precise context-specific repression of transcription. However, our understanding of how sequence variation impacts the functional activity of RDs is limited. To address this gap, we generated a high-throughput mutational scanning dataset measuring the repressor activity of 115,000 variant sequences spanning more than 50 RDs in human cells. We identified thousands of clinical variants with loss or gain of repressor function, including TWIST1 HLH variants associated with Saethre-Chotzen syndrome and MECP2 domain variants associated with Rett syndrome. We also leveraged these data to annotate short linear interacting motifs (SLiMs) that are critical for repression in disordered RDs. Then, we designed a deep learning model called TENet (Transcriptional Effector Network) that integrates sequence, structure and biochemical representations of sequence variants to accurately predict repressor activity. We systematically tested generalization within and across domains with varying homology using the mutational scanning dataset. Finally, we employed TENet within a directed evolution sequence editing framework to tune the activity of both structured and disordered RDs and experimentally test thousands of designs. Our work highlights critical considerations for future dataset design and model training strategies to improve functional variant prioritization and precision design of synthetic regulatory proteins.