Sherry Tongshuang Wu | Hello World!

Hello World!
I'm Sherry Tongshuang Wu
(吴彤霜)!

Assistant Professor

School of CS, Carnegie Mellon University (CMU SCS)

Human Computer Interaction Institute (HCII)

Language Technololgy Institute (LTI)

I am trained (by my amazing PhD advisors Jeffrey Heer and Dan Weld at the University of Washington) to be an HCI+NLP researcher. I study how humans (AI experts, lay users, domain experts) interact with (debug, audit, collaborate) AI systems.

Most recently, I work on:

Design, practical AI systems that can help users in complex tasks, where users are not perfect, not oracle, and not static.

Click & jump to some recent papers that represent my research interests and style:

Real-world AI Evaluation

We askWhat can general-purpose models do?

We doReplicate diverse human-subject experiments with general-purpose models.

User-Centered AI Instruction Following

We askHow can AI effectively fulfill true user needs?

We doPerform task-specific model testing and distillation. Design measurements that quantify user-specific net gains.

AI-centered Human Scaffolding

We askHow can humans strategically query and benefit from AI?

We doFind optimal sub-tasks for AIs and humans; Explore venues where AI mistakes can be features; Instruct humans to recover from AI errors.

If you are interested in exploring relevant topics with me at CMU, I will be looking for undergraduate, master or PhD students! PLEASE read this FAQ to find out our open projects and best ways to contact us.

Research Highlights

Completion ≠ Collaboration: Scaling Collaborative Effort with Agents

Shannon Zejiang Shen, Valerie Chen, Ken Gu, Alexis Ross, Zixian Ma, Jillian Ross, Alex Gu, Chenglei Si, Wayne Chi, Andi Peng, Jocelyn J Shen, Ameet Talwalkar, David Sontag, Tongshuang Wu

ArXiv 2025: ArXiv

General Scales Unlock AI Evaluation with Explanatory and Predictive Power

Lexin Zhou, Lorenzo Pacchiardi, Fernando Martínez-Plumed, Katherine M. Collins, Yael Moros-Daval, Seraphina Zhang, Qinlin Zhao, Yitian Huang, Luning Sun, Jonathan E. Prunty, Zongqian Li, Pablo Sánchez-García, Kexin Jiang Chen, Pablo A. M. Casares, Jiyun Zu, John Burden, Behzad Mehrbakhsh, David Stillwell, Manuel Cebrian, Jindong Wang, Peter Henderson, Sherry Tongshuang Wu, Patrick C. Kyllonen, Lucy Cheke, Xing Xie, José Hernández-Orallo

ArXiv 2025: ArXiv 2409.08775

SPHERE: An Evaluation Card for Human-AI Systems

Qianou Ma*, Dora Zhao*, Xinran Zhao, Chenglei Si, Chenyang Yang, Ryan Louie, Ehud Reiter, Diyi Yang+, Tongshuang Wu+

ACL Findings 2025: Findings of the Association for Computational Linguistics

What Prompts Don’t Say: Understanding and Managing Underspecification in LLM Prompts

Chenyang Yang, Yike Shi, Qianou Ma, Michael Xieyang Liu, Christian Kästner, Tongshuang Wu

ArXiv 2025: arXiv:2505.13360

Checklists Are Better Than Reward Models For Aligning Language Models

Vijay Viswanathan, Yanchao Sun, Shuang Ma, Xiang Kong, Meng Cao, Graham Neubig, Tongshuang Wu

NeurIPS Spotlight 2025: The Thirty-Ninth Annual Conference on Neural Information Processing Systems

Promp2Model: Generating Deployable Models from Natural Language Instructions

Vijay Viswanathan, Chenyang Zhao, Amanda Bertsch, Tongshuang Wu, Graham Neubig

EMNLP Demo Track 2023: The 2023 Conference on Empirical Methods in Natural Language Processing

MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers

Jushaan Singh Kalra, Xinran Zhao, To Eun Kim, Fengyu Cai, Fernando Diaz, Tongshuang Wu

EMNLP 2025: The 2025 Conference on Empirical Methods in Natural Language Processing

LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs

Tongshuang Wu, Haiyi Zhu, Maya Albayrak, Alexis Axon, Amanda Bertsch, Wenxing Deng, Ziqi Ding, Bill Guo, Sireesh Gururaja, Tzu-Sheng Kuo, Jenny T Liang, Ryan Liu, Ihita Mandal, Jeremiah Milbauer, Xiaolin Ni, Namrata Padmanabhan, Subhashini Ramkumar, Alexis Sudjianto, Jordan Taylor, Ying-Jui Tseng, Patricia Vaidos, Zhijin Wu, Wei Wu, Chenyang Yang

CHI Case Study 2025: the 2025 Conference on Human Factors in Computing Systems

What Should We Engineer in Prompts? Training Humans in Requirement-Driven LLM Use

Qianou Ma, Weirui Peng, Chenyang Yang, Hua Shen, Kenneth Koedinger, Tongshuang Wu

TOCHI 2025: ACM Transactions on Computer-Human Interaction

Sherry @ CMU

Research Highlights