Are you applying to the internship?
Job Description
Student Researcher (LLM Post Training – Agent & Reinforcement Learning) – 2026 Start (PhD) | ByteDance
The Tone:
This is an internship at ByteDance, for a 2026 start, located in the United States. ByteDance is an organization that inspires creativity and builds innovative products to help people express themselves and connect. This role is crucial for actively contributing to the company’s products and research, supporting its future plans and emerging technologies. The Seed team, where this role is situated, is dedicated to pioneering new paths toward artificial general intelligence and advancing the frontier of intelligence for technology and society.
The TL;DR
• Role: Internship
• Location: United States
• Pay: $60 hourly
• Team: Seed LLM Post Training team
• Mission: Researching cutting-edge post-train technologies and providing core post-train capabilities for unified multimodal large models.
• Tech Stack: RL frameworks, LLM frameworks
What You’ll Actually Do
• Research: Explore large-scale models and optimize systems.
• Development: Conduct data construction, instruction tuning, preference alignment, and model optimization.
• Enhancement: Improve relevant model capabilities, such as reasoning, code, and math.
• Innovation: Perform in-depth research and exploration of future use cases.
• Contribution: Actively contribute to products and research, and to the organization’s future plans and emerging technologies.
The Must-Haves
• Background: Currently pursuing a PhD in Computer Science, AI, or a related field.
• Experience: Research experience in reinforcement learning, sequential decision-making, or agent behavior, demonstrated by first-author publications in accredited ML/AI conferences (e.g., NeurIPS, ICLR, ICML).
• Skills: Solid programming and experimentation skills, including with RL or LLM frameworks.
• Bonus: Experience with LLM agents, tool use, or prompt-based control; familiarity with environments such as WebArena, ALFWorld, or programmatic reasoning tasks; understanding of RL techniques such as reward shaping, memory augmentation, or curriculum learning.