đź—¨ About Me

I am a first-year PhD student in the Computational Biology and Bioinformatics (CBB) program at the University of Southern California. My research lies at the intersection of AI and biology, where I design computational approaches to accelerate discoveries in synthetic biology, drug discovery, and molecular interaction. Specifically, I primarily focus on those areas: (1) AI driven drug discovery [SMARTBind, Apo2Mol]; (2) Biological foundation model [Tabula, ProTrek, SaProtHub, Nullsettes]; (3) Machine learning enabled protein evolution [Sequence Display]. I am also interested in bioinformatics tool and database development [SICER 2.0, gmx_mmpbsa_py, HNOXPred].

Before starting my PhD, I was very fortunate to work with and learn from inspiring mentors and collaborators across these fields, you can find them in the experience panel.

đź“– Educations

  • 2025 - current, PhD student, Computational Biology and Bioinformatics. University of Southern California. Los Angeles, CA
  • 2022 - 2024, Master of Science in Engineering, Computer Science. Johns Hopkins University. Baltimore, MD
  • 2018 - 2022, Bachelor of Science, Computer Science. Wenzhou-Kean University. Wenzhou, China

đź“° News

  • 2025.11: “Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models” is accepted by AAAI 2026!
  • 2025.10: “Engineering Unnatural Cells with a 21st Amino Acid as a Living Epigenetic Sensor” is on Nature Communications!
  • 2025.09: “Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input” is released on bioRxiv.
  • 2025.09: “Tabula: A Tabular Self-Supervised Foundation Model for Single-Cell Transcriptomics” is accepted by NeurIPS 2025!
  • 2025.09: “Biosynthesis of Unnatural Cyclodipeptides through Genetic Code Expansion and Cyclodipeptide Synthase Evolution” is on Journal of the American Chemical Society!
Earlier News (Click to Expand)
  • 2025.08: “Evaluating DNA function understanding in genomic language models using evolutionarily implausible sequences” (follow-up work from the GenBio workshop) is released on arXiv.
  • 2025.08: “SaprotHub: Democratizing Protein Language Model Training, Sharing and Collaboration for the Biology Community” is accepted by Nature Biotechnology!
  • 2025.07: “A tri-modal protein language model enables advanced protein searches” is accepted by Nature Biotechnology!
  • 2025.07: “Predicting function of evolutionarily implausible DNA sequences” is presented at Q-BIO 2025 Conference: Emergent Orders in Living Systems Across Scales, see our poster.
  • 2025.06: “Sequence Display-Enabled Machine Learning for Protein Evolution” is presented at 2025 Synthetic Biology: Engineering, Evolution, & Design, see our poster.
  • 2025.06: “Predicting function of evolutionarily implausible DNA sequences” is accepted by ICML 2025 Generative AI and Biology Workshop!
  • 2025.04: I will be joining the PhD program in Computational Biology and Bioinformatics at USC QCB, looking forward to the journey.
  • 2025.01: “Toward a privacy-preserving predictive foundation model of single-cell transcriptomics with federated learning and tabular modeling” is released on bioRxiv, see our post.

📝 Selected Publications

2026
AAAI 2026
sym

Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models

Xinzhe Zheng, Shiyu Jiang, Gustavo Seabra, Chenglong Li, Yanjun Li. AAAI (poster), 2026.

GitHub

2025
bioRxiv
sym

Small Molecule Approach to RNA Targeting Binder Discovery (SMARTBind) Using Deep Learning Without Structural Input

Shiyu Jiang †, Amirhossein Taghavi †, Tenghui Wang, Samantha M. Meyer, Jessica L. Childs-Disney, Chenglong Li, Mattew D. Disney, Yanjun Li. bioRxiv, 2025. (Under major revision in Nature Research Journal)

GitHub

bioRxiv
sym

Sequence Display: Generating Large-Scale Sequence–Activity Datasets to Advance Universal Protein Evolution

Linqi Cheng †, Xinzhe Zheng †, Shiyu Jiang †, Hu Y, Liu Y, Yang K, Rui J, Ding H, Zhang M, Yuan T, Ye H, Li C, Kevin K. Yang, Xiongyi Huang, Han Xiao. bioRxiv, 2025. (Under major revision in Nature Research Journal)

GitHub

arXiv
sym

Evaluating DNA function understanding in genomic language models using evolutionarily implausible sequences

Shiyu Jiang, Xuyin Liu, Jerry Zitong Wang. arXiv, 2025. (Under major revision in Research Journal)

GitHub

NeurIPS 2025
sym

Tabula: A Tabular Self-Supervised Foundation Model for Single-Cell Transcriptomics

Jiayuan Ding †, Jianhui Lin †, Shiyu Jiang †, Yixin Wang, Ziyang Mao, Zhaoyu Fang, Jiliang Tang, Min Li, Xiaojie Qiu. NeurIPS (poster), 2025.

GitHub

JACS
sym

Biosynthesis of Unnatural Cyclodipeptides through Genetic Code Expansion and Cyclodipeptide Synthase Evolution

Hu Y †, Cheng L †, Liu Y, Liu R, Jiang S, Yuan T, Wang Y, Ye H, Xiao H. Journal of the American Chemical Society, 2025.

GitHub

Nature Biotechnology
sym

A tri-modal protein language model enables advanced protein searches

Jin Su †, Yan He †, Shiyang You †, Shiyu Jiang, Xibin Zhou, Xuting Zhang, Yuxuan Wang, Xining Su, Igor Tolstoy, Xing Chang, Hongyuan Lu, Fajie Yuan. Nature Biotechnology, 2025.

Online Server

Nature Biotechnology
sym

SaprotHub: Democratizing Protein Language Model Training, Sharing and Collaboration for the Biology Community

Jin Su, Zhikai Li, Tianli Tao, Chenchen Han, Yan He, Fengyuan Dai, Qingyan Yuan, Yuan Gao, Tong Si, Xuting Zhang, Yuyang Zhou, Junjie Shan, Xibin Zhou, Xing Chang, Shiyu Jiang, Dacheng Ma, The OPMC, Martin Steinegger, Sergey Ovchinnikov, Fajie Yuan. Nature Biotechnology, 2025.

GitHub | OPMC

ICML 2025 GenBio Workshop
sym

Predicting function of evolutionarily implausible DNA sequences

Shiyu Jiang, Xuyin Liu, Jerry Zitong Wang. ICML 2025 Generative AI and Biology Workshop, 2025.

GitHub

2024
ACS Nano
sym

Integrating Metal–Phenolic Networks-Mediated Separation and Machine Learning-Aided Surface-Enhanced Raman Spectroscopy for Accurate Nanoplastics Quantification and Classification

Haoxin Ye, Shiyu Jiang, Yan Yan, Bin Zhao, Edward R Grant, David D Kitts, Rickey Y Yada, Anubhav Pratap-Singh, Alberto Baldelli, Tianxi Yang. ACS Nano, 2024.

Featured on Cover

2023
ALIFE 2023
sym

Simulating Disease Spread During Disaster Scenarios

Shiyu Jiang, Heejoong Kim, Fabio Henrique Tanaka, Claus Aranha, Anna Bogdanova, Kimia Ghobadi, Anton Dahbura. The International Conference on Artificial Life, 2023.

GitHub

2022
Bioinformatics
sym

HNOXPred: a web tool for the prediction of gas-sensing H-NOX proteins from amino acid sequence

Shiyu Jiang, Hemn Barzan Abdalla, Chuyun Bi, Yi Zhu, Xuechen Tian, Yixin Yang, Aloysius Wong. Bioinformatics, 2022.

Online Server | GitHub

🧑‍💻 Experience

📝 Service

  • Journal reviewer: IEEE Transactions on Computational Biology and Bioinformatics
  • Conference reviewer: AAAI 2026

🌎 Miscellaneous

Outside of work, you’ll often find me at the gym, playing soccer, road cycling, or go hiking. I also enjoy playing table tennis and the piano occasionally.