👋 About Me

I am a graduate researcher at GIX Institute — a dual-degree program between Tsinghua University and the University of Washington. I work on post-training for large language models and agents, with an emphasis on verifiable reinforcement learning and scalable synthetic data for long-CoT reasoning.

Most recently (Oct 2024 – Oct 2025) I spent a year at Moonshot AI on the RL Post-Training team. There I was a core contributor to Kimi-Researcher, our multi-turn search & deep-research agent for hard queries, and owned the Data Science capability of Kimi K2 & K2 Thinking. My technical focus was verifiable RL and synthetic data.

Before Moonshot, I was a Research Intern at Microsoft Research Asia, in the Natural Language Computing Group, where I first-authored Meta Reasoning for LLMs. Earlier I did HCI and agentic-systems research at Tsinghua Future Lab and AIR, publishing at CHI, UIST, IROS, IEEE VR, and TEI. My undergraduate degree is a Bachelor of Engineering (B.Eng.) in Industrial Design from Tongji University (Rank 1 / 77, Shanghai Outstanding Graduate).

I work on agents so they can take on the highest-value work humans do today — because creating real economic value is how prosperity keeps extending, rather than stalling.

🔥 News

  • 2025.11 Kimi K2 Thinking released. [GitHub] [Twitter] [HuggingFace]
  • 2025.10 Completed one-year internship at Moonshot AI (Kimi, RL Post-Training team) with a return offer.
  • 2025.07 Kimi K2: Open Agentic Intelligence Technical Report released — 574 citations to date. [arXiv] [Page] [GitHub] [Twitter]
  • 2025.06 Kimi-Researcher released — a multi-turn autonomous search & research agent for hard queries, trained end-to-end with RL; public benchmark HLE Pass@1 8.6% → 26.9% (then-SOTA), xbench-DeepSearch 69%. [Tech Blog] [GitHub] [Twitter]
  • 2024.10 Joined Moonshot AI (Kimi) RL Post-Training team as ML Engineer Intern.
  • 2024.10 Paper Mul-O (first author) accepted to UIST '24.
  • 2024.10 Paper SurrealDriver accepted to IROS '24.
  • 2024.06 Released first-author paper Meta Reasoning for LLMs from my research internship at Microsoft Research Asia's Natural Language Computing Group. [arXiv]
  • 2024.03 Paper OdorAgent accepted to IEEE VR '24.
  • 2024.01 Started research internship at Microsoft Research Asia (Natural Language Computing Group).
  • 2023.09 Started M.S. at GIX Institute (Tsinghua × UW dual degree).
  • 2023.06 Graduated from Tongji University with a Bachelor of Engineering in Industrial Design — Rank 1 / 77, Shanghai Outstanding Graduate, Outstanding Thesis; selected for direct postgraduate admission to Tsinghua.
  • 2023.04 Poster Atmospheror accepted to CHI '23.
  • 2023.02 Paper Bamboo Agents (first author) accepted to TEI '23.

📚 Publications

Full list on Google Scholar. ★ = first author.

Citations by year syncing…

2023
7
2024
56
2025
439
2026
274

Auto-synced daily from Google Scholar via GitHub Actions.

  1. arXiv 2025

    Kimi K2: Open Agentic Intelligence

    Kimi Team, Y. Bai, Y. Bao, …, Peizhong Gao, et al. · Moonshot AI Technical Report · 574 citations

    Contributed to post-training and owned Data Science capability. Focus: verifiable RL and synthetic data. Full benchmarks in the Technical Report.

  2. Tech Blog 2025

    Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities

    Moonshot AI Kimi-Researcher Team · core contributor

    A multi-turn, autonomous search & research agent for hard queries, trained end-to-end with RL. My work focused on verifiable RL and synthetic data; this was the first work on our team to produce strong results on GAIA — a hard-search benchmark. Public benchmark: HLE Pass@1 8.6% → 26.9% (then-SOTA).

  3. arXiv 2024 ★

    Meta Reasoning for Large Language Models

    Peizhong Gao, A. Xie, S. Mao, W. Wu, Y. Xia, H. Mi, et al. · arXiv preprint · Microsoft Research Asia — Natural Language Computing Group · 43 citations

    First-author paper from my research internship at Microsoft Research Asia's Natural Language Computing Group. MRP is a dynamic reasoning-pathway selector inspired by how humans switch thinking modes — around +10% across comprehensive benchmarks and SOTA among seven popular methods on four LLMs.

  4. UIST 2024 ★

    Mul-O: Encouraging Olfactory Innovation in Various Scenarios through a Large-Language-Model-Enabled Multimodal Toolkit

    Peizhong Gao, F. Liu, D. Wen, Y. Gao, L. Zhang, C. Wang, Q. Zhang, Y. Zhang, S. Ma, et al. · ACM UIST '24

  5. IROS 2024

    SurrealDriver: Designing LLM-powered Generative Driver Agent Framework based on Human Drivers' Driving-Thinking Data

    Y. Jin, R. Yang, Z. Yi, X. Shen, H. Peng, X. Liu, J. Qin, J. Li, J. Xie, Peizhong Gao, et al. · IEEE/RSJ IROS '24 · 35 citations

  6. IEEE VR 2024

    OdorAgent: Generate Odor Sequences for Movies Based on Large Language Model

    Y. Zhang, Peizhong Gao, F. Kang, J. Li, J. Liu, Q. Lu, Y. Xu · IEEE VR '24 · 7 citations

  7. arXiv 2023

    SurrealDriver: Designing Generative Driver Agent Simulation Framework in Urban Contexts based on Large Language Model

    Y. Jin, X. Shen, H. Peng, X. Liu, J. Qin, J. Li, J. Xie, Peizhong Gao, G. Zhou, J. Gong · arXiv preprint · 98 citations

  8. CHI 2023

    Atmospheror: Towards an Olfactory Interactive System to Enhance Environmental Atmospheres in Indoor Spaces

    Q. Lu, Y. Zhang, Y. Zhang, S. E. Ma, Y. Zhang, Y. Qin, Peizhong Gao, Q. Zhang, Y. Xu · ACM CHI '23 · 13 citations

  9. TEI 2023 ★

    Bamboo Agents: Exploring the Potentiality of Digital Craft through Intelligent Design Agents

    Peizhong Gao, T. Gao, Y. Yang, Z. Liu, J. Shi, J. Li · ACM TEI '23 · 7 citations

💼 Work Experience

  • 2024.10 – 2025.10

    ML Engineer Intern (return offer), Moonshot AI — RL Post-Training team. Core contributor to Kimi-Researcher; contributor to Kimi K2 & K2 Thinking; contributor to Kimi-Audio.

  • 2024.01 – 2024.06

    Research Intern, Microsoft Research Asia — Natural Language Computing Group. First-authored Meta Reasoning for LLMs.

  • 2022.12 – 2024.06

    Graduate Researcher (HCI), Tsinghua Future Lab & Institute for AI Industry Research. Publications at CHI, UIST, IROS, IEEE VR, TEI.

  • 2022.07 – 2022.11

    Research & Strategy Intern, Accenture — Designaffairs, Shanghai. Honda & Stanley HMI projects.

🏆 Selected Honors & Grants

  • $200K Google Cloud Grants — GPU credits, Google for Startups, 2024.
  • First-class Award Grants across multiple programs — Tsinghua University, 2024.
  • Gold Prize (top <0.01%), Team Leader — China International College Student Innovation Competition, Ministry of Education, 2024.
  • Champion, Team Leader — Generative AI Innovation Competition, Amazon, 2024.
  • Shanghai Outstanding Graduate — Shanghai Municipal Education Commission, 2023.
  • Outstanding Bachelor's Thesis — Tongji University, 2023.
  • Direct Postgraduate Admission to Tsinghua University, 2023.
  • Rank 1 / 77, GPA 4.90 / 5.00 — Industrial Design, Tongji University, 2019 – 2023.
  • First-class Scholarship × 4 — Tongji University, 2019 – 2023.
  • iF Design Talent Award (Winner, top <1%), Team Leader — iF International Forum Design, 2022.

🎓 Education

  • 2023.09 – 2027.03 M.S. in Technology Innovation (CS) — Dual Master's Degree, GIX Institute (Tsinghua University & University of Washington).
  • 2019.09 – 2023.06 Bachelor of Engineering (B.Eng.) in Industrial Design (minor: Artificial Intelligence), College of Design and Innovation, Tongji University.
    GPA 4.90 / 5.00 · Rank 1 / 77 · Shanghai Outstanding Graduate · Outstanding Bachelor's Thesis · selected for direct postgraduate admission to Tsinghua · iF Design Talent Award.

🛠 Skills

Post-training: RL · GRPO · DPO · PPO · SFT · Reward Modeling · Rejection Sampling · Scalable Synthetic Data.

Reasoning & Agents: Agentic RL · Tool-Use · Long-CoT RL · Verifiable RL · RAG · Eval & Benchmark Design.

Engineering: Python · PyTorch · vLLM · Transformers · Docker · Git · HTML.

🌿 Outside Work

Outside work, I am into botany, vocal singing, and the trio of tea, coffee, and spirits — plus the occasional low-risk sport. Most of all, I like coffee chats with friends across very different industries. The world is more plural than any single field lets you see, and a lot of my interest in agents taking on high-value work — and driving real economic output actually started in those conversations.

📮 Contact

Best reached by email at peizhong@uw.edu. Also on Google Scholar, LinkedIn, and GitHub.