π Hi, Iβm Yantao Liu β currently a Senior Algorithm Engineer at Qwen.
My work focuses on Reward Modeling and Reinforcement Learning training of LLMs.
π For my full list of publications, please refer to my Google Scholar profile.
- RM-Bench β A benchmark that tests reward models on subtle content differences and style bias resistance to better align language models.
- PairJudge-RM β A pairwise reward model using knockout tournaments to improve Best-of-N sampling for LLMs.
- HelpSteer3 β An open-source dataset for training models to generate more helpful responses to user prompts.
If youβre interested in reward modeling or any of my projects, feel free to email me personal email!
-
M.S. Student
University of the Chinese Academy of Sciences
2022 β 2025 -
B.S. Student
Beijing University of Posts and Telecommunications
2018 β 2022
Thanks for visiting my page!