Skip to content
View Xiao9905's full-sized avatar
  • Tsinghua University
  • Beijing, China

Organizations

@THUDM

Block or report Xiao9905

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Xiao9905/README.md

Hi, welcome to my Github 👋

I am Xiao Liu, a fifth-year PhD student in Tsinghua University since 2021, expected to graduate on June 2026.

  • 🔭 Interested in Machine Learning, Natural Language Processing, and Foundation Models.

  • 🌱 Find my up-to-date publication list in Google Scholar! Some of my proud works as lead authors:

    Large Language Model (LLM) Training and Prompt Learning
    Foundational Agents For Real-world Challenging Missions
    • AgentBench (ICLR'24): the first systematic multi-dimensional benchmark to evaluate LLMs as Agents in 8 distinct environments deriving from real-world practical missions.
    • AutoWebGLM (KDD'24): a strong web navigating agent constructed upon ChatGLM-3-6B, outperforming prompted GPT-4 on Mind2Web, WebArena, and our constructed new dataset AutoWebBench.
    • VisualAgentBench (ICLR'25): a comprehensive framework to train and test Large Multimodal Models (LMMs) to serve as visual foundation agents.
    • WebRL (ICLR'25): self-evolving online curriculum RL transform open LLMs to outperform GPT-4-Turbo on Web Agent tasks by 160%.
    • AndroidLab (ACL'25): training and systematic benchmarking android autonomous agents.
    • AutoGLM: autonomous foundation agents for GUIs, the first Phone Use and Web Browser Use agent family.
    Alignment and Scalable Oversights over LLMs and Diffusers
    • ImageReward (NeurIPS'23): the first general-purpose text-to-image human preference reward model (RM) for RLHF, outperforming CLIP/BLIP/Aesthetic by 30% in terms of human preference prediction.
    • BPO (Black-box Prompt Optimization, ACL'24): a novel direction to align LLMs via preference-aware prompt optimization. Improving ChatGPT, Claude, LLaMA on human preference's win rates by 20%+ without training them.
    • AlignBench (ACL'24): the first comprehensive benchmark on evaluating LLMs' Chinese alignment, deriving from ChatGLM's online real scenarios. Adopted by top Chinese LLMs (ChatGLM, Qwen, DeepSeek, Yi, Baichuan, Abab, and etc.)
    • SPaR (ICLR'25): Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
    Self-supervised Learning and Reasoning
  • 🤔 Dedicated to building next-generation of AI systems via both Large Pre-trained Model and Symbolic Agent Reasoning.

  • 💬 Feel free to drop me an email for:

    • Any form of collaboration
    • Any issue about my works or code
    • Interesting ideas to discuss or just chatting

Pinned Loading

  1. zai-org/GLM-130B zai-org/GLM-130B Public

    GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

    Python 7.7k 607

  2. zai-org/ChatGLM-6B zai-org/ChatGLM-6B Public

    ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

    Python 41.2k 5.2k

  3. THUDM/P-tuning THUDM/P-tuning Public

    A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

    Python 938 115

  4. THUDM/P-tuning-v2 THUDM/P-tuning-v2 Public

    An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks

    Python 2.1k 207

  5. THUDM/AgentBench THUDM/AgentBench Public

    A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

    Python 3k 222

  6. zai-org/Open-AutoGLM zai-org/Open-AutoGLM Public

    An Open Phone Agent Model & Framework. Unlocking the AI Phone for Everyone

    Python 20.3k 3.3k