👋 About Me
I am a graduate researcher at GIX Institute — a dual-degree program between Tsinghua University and the University of Washington. I work on post-training for large language models and agents, with an emphasis on verifiable reinforcement learning and scalable synthetic data for long-CoT reasoning.
Most recently (Oct 2024 – Oct 2025) I spent a year at Moonshot AI on the RL Post-Training team. There I was a core contributor to Kimi-Researcher, our multi-turn search & deep-research agent for hard queries, and owned the Data Science capability of Kimi K2 & K2 Thinking. My technical focus was verifiable RL and synthetic data.
Before Moonshot, I was a Research Intern at Microsoft Research Asia, in the Natural Language Computing Group, where I first-authored Meta Reasoning for LLMs. Earlier I did HCI and agentic-systems research at Tsinghua Future Lab and AIR, publishing at CHI, UIST, IROS, IEEE VR, and TEI. My undergraduate degree is a Bachelor of Engineering (B.Eng.) in Industrial Design from Tongji University (Rank 1 / 77, Shanghai Outstanding Graduate).
I work on agents so they can take on the highest-value work humans do today — because creating real economic value is how prosperity keeps extending, rather than stalling.
🔥 News
- 2025.11 Kimi K2 Thinking released. [GitHub] [Twitter] [HuggingFace]
- 2025.10 Completed one-year internship at Moonshot AI (Kimi, RL Post-Training team) with a return offer.
- 2025.07 Kimi K2: Open Agentic Intelligence Technical Report released — 574 citations to date. [arXiv] [Page] [GitHub] [Twitter]
- 2025.06 Kimi-Researcher released — a multi-turn autonomous search & research agent for hard queries, trained end-to-end with RL; public benchmark HLE Pass@1 8.6% → 26.9% (then-SOTA), xbench-DeepSearch 69%. [Tech Blog] [GitHub] [Twitter]
- 2024.10 Joined Moonshot AI (Kimi) RL Post-Training team as ML Engineer Intern.
- 2024.10 Paper Mul-O (first author) accepted to UIST '24.
- 2024.10 Paper SurrealDriver accepted to IROS '24.
- 2024.06 Released first-author paper Meta Reasoning for LLMs from my research internship at Microsoft Research Asia's Natural Language Computing Group. [arXiv]
- 2024.03 Paper OdorAgent accepted to IEEE VR '24.
- 2024.01 Started research internship at Microsoft Research Asia (Natural Language Computing Group).
- 2023.09 Started M.S. at GIX Institute (Tsinghua × UW dual degree).
- 2023.06 Graduated from Tongji University with a Bachelor of Engineering in Industrial Design — Rank 1 / 77, Shanghai Outstanding Graduate, Outstanding Thesis; selected for direct postgraduate admission to Tsinghua.
- 2023.04 Poster Atmospheror accepted to CHI '23.
- 2023.02 Paper Bamboo Agents (first author) accepted to TEI '23.
📚 Publications
Full list on Google Scholar. ★ = first author.
Citations by year syncing…
Auto-synced daily from Google Scholar via GitHub Actions.
-
arXiv 2025
Kimi K2: Open Agentic Intelligence
Contributed to post-training and owned Data Science capability. Focus: verifiable RL and synthetic data. Full benchmarks in the Technical Report.
-
Tech Blog 2025
Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities
A multi-turn, autonomous search & research agent for hard queries, trained end-to-end with RL. My work focused on verifiable RL and synthetic data; this was the first work on our team to produce strong results on GAIA — a hard-search benchmark. Public benchmark: HLE Pass@1 8.6% → 26.9% (then-SOTA).
-
arXiv 2024 ★
Meta Reasoning for Large Language Models
First-author paper from my research internship at Microsoft Research Asia's Natural Language Computing Group. MRP is a dynamic reasoning-pathway selector inspired by how humans switch thinking modes — around +10% across comprehensive benchmarks and SOTA among seven popular methods on four LLMs.
-
UIST 2024 ★
Mul-O: Encouraging Olfactory Innovation in Various Scenarios through a Large-Language-Model-Enabled Multimodal Toolkit
-
IROS 2024
SurrealDriver: Designing LLM-powered Generative Driver Agent Framework based on Human Drivers' Driving-Thinking Data
-
IEEE VR 2024
OdorAgent: Generate Odor Sequences for Movies Based on Large Language Model
-
arXiv 2023
-
CHI 2023
Atmospheror: Towards an Olfactory Interactive System to Enhance Environmental Atmospheres in Indoor Spaces
-
TEI 2023 ★
Bamboo Agents: Exploring the Potentiality of Digital Craft through Intelligent Design Agents
💼 Work Experience
-
2024.10 – 2025.10
ML Engineer Intern (return offer), Moonshot AI — RL Post-Training team. Core contributor to Kimi-Researcher; contributor to Kimi K2 & K2 Thinking; contributor to Kimi-Audio.
-
2024.01 – 2024.06
Research Intern, Microsoft Research Asia — Natural Language Computing Group. First-authored Meta Reasoning for LLMs.
-
2022.12 – 2024.06
Graduate Researcher (HCI), Tsinghua Future Lab & Institute for AI Industry Research. Publications at CHI, UIST, IROS, IEEE VR, TEI.
-
2022.07 – 2022.11
Research & Strategy Intern, Accenture — Designaffairs, Shanghai. Honda & Stanley HMI projects.
🏆 Selected Honors & Grants
- $200K Google Cloud Grants — GPU credits, Google for Startups, 2024.
- First-class Award Grants across multiple programs — Tsinghua University, 2024.
- Gold Prize (top <0.01%), Team Leader — China International College Student Innovation Competition, Ministry of Education, 2024.
- Champion, Team Leader — Generative AI Innovation Competition, Amazon, 2024.
- Shanghai Outstanding Graduate — Shanghai Municipal Education Commission, 2023.
- Outstanding Bachelor's Thesis — Tongji University, 2023.
- Direct Postgraduate Admission to Tsinghua University, 2023.
- Rank 1 / 77, GPA 4.90 / 5.00 — Industrial Design, Tongji University, 2019 – 2023.
- First-class Scholarship × 4 — Tongji University, 2019 – 2023.
- iF Design Talent Award (Winner, top <1%), Team Leader — iF International Forum Design, 2022.
🎓 Education
- 2023.09 – 2027.03 M.S. in Technology Innovation (CS) — Dual Master's Degree, GIX Institute (Tsinghua University & University of Washington).
-
2019.09 – 2023.06 Bachelor of Engineering (B.Eng.) in Industrial Design (minor: Artificial Intelligence), College of Design and Innovation, Tongji University.
GPA 4.90 / 5.00 · Rank 1 / 77 · Shanghai Outstanding Graduate · Outstanding Bachelor's Thesis · selected for direct postgraduate admission to Tsinghua · iF Design Talent Award.
🛠 Skills
Post-training: RL · GRPO · DPO · PPO · SFT · Reward Modeling · Rejection Sampling · Scalable Synthetic Data.
Reasoning & Agents: Agentic RL · Tool-Use · Long-CoT RL · Verifiable RL · RAG · Eval & Benchmark Design.
Engineering: Python · PyTorch · vLLM · Transformers · Docker · Git · HTML.
🌿 Outside Work
Outside work, I am into botany, vocal singing, and the trio of tea, coffee, and spirits — plus the occasional low-risk sport. Most of all, I like coffee chats with friends across very different industries. The world is more plural than any single field lets you see, and a lot of my interest in agents taking on high-value work — and driving real economic output actually started in those conversations.
📮 Contact
Best reached by email at peizhong@uw.edu. Also on Google Scholar, LinkedIn, and GitHub.