Are you applying to the internship?
Job Description
Job Title: PhD Intern – Seed LLM Post Training Team (Reward Models)
—
About the Role
This PhD internship offers a unique opportunity to contribute to cutting-edge research and development within the Seed LLM Post Training team at ByteDance. Interns will play a crucial role in enhancing the capabilities of unified multimodal large models, specifically focusing on the design, training, and evaluation of reward models. This role involves deep research into next-generation post-training technologies and their application to optimize key areas like reasoning, coding, and agent systems.
About the Team: Seed LLM Post Training
The Seed LLM Post Training team is at the forefront of researching and developing advanced post-training technologies for unified multimodal large models. Their mission is to explore and implement next-generation techniques such as Supervised Fine-Tuning (SFT), Reward Modeling (RM), Reinforcement Learning (RL), and self-learning. The team is dedicated to significantly optimizing and improving core LLM capabilities, including reasoning, coding, agent systems, and omni models, to achieve state-of-the-art performance.
About ByteDance PhD Internships
PhD internships at ByteDance provide students with invaluable opportunities to actively contribute to the company’s products, research initiatives, and future technological advancements. This dynamic experience combines practical, hands-on learning with enriching community-building and professional development events. Interns will collaborate closely with industry experts, gaining insights and mentorship in a fast-paced, innovative environment.
Responsibilities
As a PhD Intern focused on Reward Models, your responsibilities will include:
- Design and train reward models that accurately reflect nuanced human preferences in Large Language Model (LLM) outputs.
- Develop and evaluate components of a comprehensive Reward Model System, integrating model predictions, verifier feedback, tool usage, and agent signals to generate reliable and generalizable reward estimates.
- Create reward models specifically aimed at enhancing controllability and instruction-following performance, particularly in complex scenarios involving multi-part user requests.
- Contribute to data selection and synthesis pipelines, leveraging reward signals to improve post-training data quality and expand the model’s capabilities.
- Conduct research into scalable methods for learning from pairwise comparisons, rankings, or human demonstrations across a diverse range of tasks.
Qualifications
Minimum Qualifications:
- Currently pursuing a PhD in Computer Science, Machine Learning, or a closely related technical field.
- Demonstrated research excellence through first-author publications in top-tier venues (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP).
- Proven research experience in areas such as reward modeling, human preference learning, or LLM post-training.
- Proficiency in Python and deep learning frameworks, specifically PyTorch or JAX.
- Must obtain and maintain work authorization in the country of employment at the time of hire and throughout the employment period.
Preferred Qualifications:
- Practical experience with advanced post-training techniques such as RLHF (Reinforcement Learning from Human Feedback), DPO (Direct Preference Optimization), rejection sampling, or ranking-based supervision methods.
- Familiarity with concepts like model-based reward composition, verifier integration, or synthetic data pipelines.
- A strong understanding of how reward models interact with large-scale Reinforcement Learning and agent systems.
Job Information & Compensation
Compensation:
The hourly rate for this Campus Intern position in the selected city is $65 – $65.
Benefits:
Benefits may vary based on employment nature and country of work location. Interns receive day-one access to health insurance, life insurance, wellbeing benefits, and more. Additionally, interns receive 10 paid holidays per year and paid sick time (56 hours if hired in the first half of the year, 40 hours if hired in the second half). Interns not working 100% remotely may also be eligible for a housing allowance. Please note that the Company reserves the right to modify or change these benefits programs at any time, with or without notice.
For Los Angeles County (unincorporated) Candidates:
Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws, including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse, and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment:
- Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues.
- Appropriately handling and managing confidential information, including proprietary and trade secret information, and access to information technology systems.
- Exercising sound judgment.
About Doubao (Seed)
Founded in 2023, the ByteDance Doubao (Seed) Team is dedicated to pioneering advanced AI foundation models. The team’s overarching goal is to lead in cutting-edge research and drive significant technological and societal advancements. With a strong commitment to AI, their research areas encompass deep learning, reinforcement learning, Language, Vision, Audio, AI Infrastructure, and AI Safety. The team operates labs and research positions across China, Singapore, and the US.
Why Join ByteDance
Inspiring creativity is at the core of ByteDance’s mission. Their innovative products are built to help people authentically express themselves, discover, and connect. This mission is made possible by global, diverse teams. Together, ByteDancers create value for communities, inspire creativity, and enrich life—a mission pursued daily. As ByteDancers, the goal is to achieve great things with great people, leading with curiosity, humility, and a desire to make an impact in a rapidly growing tech company. By constantly iterating and fostering an “Always Day 1” mindset, they achieve meaningful breakthroughs for themselves, the Company, and users. Joining ByteDance offers limitless possibilities for creation and growth.
Diversity & Inclusion
ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Their platform connects people globally, and so does their workplace. The mission to inspire creativity and enrich life is supported by a commitment to celebrating diverse voices and creating an environment that reflects the many communities they reach. ByteDance is passionate about Diversity & Inclusion and encourages candidates who share this passion.
Reasonable Accommodation
ByteDance is dedicated to providing reasonable accommodations in their recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs, or other reasons protected by applicable laws. If you require assistance or a reasonable accommodation, please reach out via the provided link: https://tinyurl.com/RA-request
Application Process
Applications will be reviewed on a rolling basis; therefore, early application is encouraged. Please clearly state your availability (Start date, End date) in your resume.