PhD Intern, LLM Post Training (Reward Models)

February 16, 2026

$65 / hour

Internship

Apply Now

Are you applying to the internship?

Job Description

Job Title: PhD Intern – Seed LLM Post Training Team (Reward Models)

—

About the Role

This PhD internship offers a unique opportunity to contribute to cutting-edge research and development within the Seed LLM Post Training team at ByteDance. Interns will play a crucial role in enhancing the capabilities of unified multimodal large models, specifically focusing on the design, training, and evaluation of reward models. This role involves deep research into next-generation post-training technologies and their application to optimize key areas like reasoning, coding, and agent systems.

About the Team: Seed LLM Post Training

The Seed LLM Post Training team is at the forefront of researching and developing advanced post-training technologies for unified multimodal large models. Their mission is to explore and implement next-generation techniques such as Supervised Fine-Tuning (SFT), Reward Modeling (RM), Reinforcement Learning (RL), and self-learning. The team is dedicated to significantly optimizing and improving core LLM capabilities, including reasoning, coding, agent systems, and omni models, to achieve state-of-the-art performance.

About ByteDance PhD Internships

PhD internships at ByteDance provide students with invaluable opportunities to actively contribute to the company’s products, research initiatives, and future technological advancements. This dynamic experience combines practical, hands-on learning with enriching community-building and professional development events. Interns will collaborate closely with industry experts, gaining insights and mentorship in a fast-paced, innovative environment.

Responsibilities

As a PhD Intern focused on Reward Models, your responsibilities will include:

Design and train reward models that accurately reflect nuanced human preferences in Large Language Model (LLM) outputs.
Develop and evaluate components of a comprehensive Reward Model System, integrating model predictions, verifier feedback, tool usage, and agent signals to generate reliable and generalizable reward estimates.
Create reward models specifically aimed at enhancing controllability and instruction-following performance, particularly in complex scenarios involving multi-part user requests.
Contribute to data selection and synthesis pipelines, leveraging reward signals to improve post-training data quality and expand the model’s capabilities.
Conduct research into scalable methods for learning from pairwise comparisons, rankings, or human demonstrations across a diverse range of tasks.

Qualifications

Minimum Qualifications:

Currently pursuing a PhD in Computer Science, Machine Learning, or a closely related technical field.
Demonstrated research excellence through first-author publications in top-tier venues (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP).
Proven research experience in areas such as reward modeling, human preference learning, or LLM post-training.
Proficiency in Python and deep learning frameworks, specifically PyTorch or JAX.
Must obtain and maintain work authorization in the country of employment at the time of hire and throughout the employment period.

Preferred Qualifications:

Practical experience with advanced post-training techniques such as RLHF (Reinforcement Learning from Human Feedback), DPO (Direct Preference Optimization), rejection sampling, or ranking-based supervision methods.
Familiarity with concepts like model-based reward composition, verifier integration, or synthetic data pipelines.
A strong understanding of how reward models interact with large-scale Reinforcement Learning and agent systems.

Job Information & Compensation

Compensation:

The hourly rate for this Campus Intern position in the selected city is $65 – $65.

Benefits:

Benefits may vary based on employment nature and country of work location. Interns receive day-one access to health insurance, life insurance, wellbeing benefits, and more. Additionally, interns receive 10 paid holidays per year and paid sick time (56 hours if hired in the first half of the year, 40 hours if hired in the second half). Interns not working 100% remotely may also be eligible for a housing allowance. Please note that the Company reserves the right to modify or change these benefits programs at any time, with or without notice.

For Los Angeles County (unincorporated) Candidates:

Qualified applicants with arrest or conviction records will be considered for employment in accordance with all federal, state, and local laws, including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Our company believes that criminal history may have a direct, adverse, and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment:

Interacting and occasionally having unsupervised contact with internal/external clients and/or colleagues.
Appropriately handling and managing confidential information, including proprietary and trade secret information, and access to information technology systems.
Exercising sound judgment.

About Doubao (Seed)

Founded in 2023, the ByteDance Doubao (Seed) Team is dedicated to pioneering advanced AI foundation models. The team’s overarching goal is to lead in cutting-edge research and drive significant technological and societal advancements. With a strong commitment to AI, their research areas encompass deep learning, reinforcement learning, Language, Vision, Audio, AI Infrastructure, and AI Safety. The team operates labs and research positions across China, Singapore, and the US.

Why Join ByteDance

Inspiring creativity is at the core of ByteDance’s mission. Their innovative products are built to help people authentically express themselves, discover, and connect. This mission is made possible by global, diverse teams. Together, ByteDancers create value for communities, inspire creativity, and enrich life—a mission pursued daily. As ByteDancers, the goal is to achieve great things with great people, leading with curiosity, humility, and a desire to make an impact in a rapidly growing tech company. By constantly iterating and fostering an “Always Day 1” mindset, they achieve meaningful breakthroughs for themselves, the Company, and users. Joining ByteDance offers limitless possibilities for creation and growth.

Diversity & Inclusion

ByteDance is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Their platform connects people globally, and so does their workplace. The mission to inspire creativity and enrich life is supported by a commitment to celebrating diverse voices and creating an environment that reflects the many communities they reach. ByteDance is passionate about Diversity & Inclusion and encourages candidates who share this passion.

Reasonable Accommodation

ByteDance is dedicated to providing reasonable accommodations in their recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs, or other reasons protected by applicable laws. If you require assistance or a reasonable accommodation, please reach out via the provided link: https://tinyurl.com/RA-request

Application Process

Applications will be reviewed on a rolling basis; therefore, early application is encouraged. Please clearly state your availability (Start date, End date) in your resume.

Date Posted

February 16, 2026
Offered Salary:

$65 / hour
Expiration date

--
Gender

Both
Qualification

Doctorate Degree
Career Level

Student

AI Resume Builder

LinkedIn Optimizer

AI Cover Letter Trending

AI Mock Interview Trending

EzApply Chrome Extension

AI Pitch Generator New

PhD Intern, LLM Post Training (Reward Models)

Are you applying to the internship?

Job Description

About the Role

About the Team: Seed LLM Post Training

About ByteDance PhD Internships

Responsibilities

Qualifications

Minimum Qualifications:

Preferred Qualifications:

Job Information & Compensation

Compensation:

Benefits:

For Los Angeles County (unincorporated) Candidates:

About Doubao (Seed)

Why Join ByteDance

Diversity & Inclusion

Reasonable Accommodation

Application Process

Related Jobs

Brand Marketing Intern

Sustainability & Resiliency Services Intern

Social Media Marketing Intern

SEO/GEO Specialist Intern

Products

For Candidates

Company

Welcome to Internexxus

Reset Password

Welcome to Internexxus

PhD Intern, LLM Post Training (Reward Models)

Are you applying to the internship?

Job Description

About the Role

About the Team: Seed LLM Post Training

About ByteDance PhD Internships

Responsibilities

Qualifications

Minimum Qualifications:

Preferred Qualifications:

Job Information & Compensation

Compensation:

Benefits:

For Los Angeles County (unincorporated) Candidates:

About Doubao (Seed)

Why Join ByteDance

Diversity & Inclusion

Reasonable Accommodation

Application Process

Share this post

Related Jobs

Login to Internexxus

Reset Password

Create a free Internexxus account

Products

For Candidates

Company