RL Training Environment Engineer Intern – Reinforcement Learning Environments

Are you applying to the internship?

Job Description

Machine Learning Engineer, RL Environments – Internship | Preference Model

The Tone:
This is an internship at Preference Model, located in San Francisco, CA (remote considered). Preference Model is building automated ML research engineering, focusing on creating high-quality RL training environments for large language models to address the brittleness of existing frontier models. This role is crucial for developing the complex, real-world RL environments needed to advance AI closer to achieving its transformative potential, in collaboration with leading AI labs.

The TL;DR
• Role: Internship
• Type: Temporary
• Location: In-person, San Francisco, CA

• Mission: This person will design, implement, and evaluate novel RL training environments for large language models.
• Tech Stack: Python, Docker, CUDA kernels, low-level GPU programming

What You’ll Actually Do
• Design: Design and build RL environments that test LLM reasoning on ML, systems, and research problems.
• Code: Write clean, production-grade Python code, not just research prototypes.
• Operate: Work with Docker, build reproducible environments, and debug when things break.
• Translate: Translate ML papers and concepts into concrete training tasks.
• Evaluate: Conduct experiments and evaluations, delivering your work into production training runs.

The Must-Haves
• Background: Student (undergrad or PhD) in Computer Science, Machine Learning, Math, Physics, or a related field.
• Experience: Ability to write real code, not just research prototypes; familiarity with how LLMs work, what they’re good at, and where they fall short; ability to work independently, take feedback, and iterate fast.
• Skills: Strong Python skills.
• Bonus: Understanding of transformer internals and experience with training or inference code; experience writing CUDA kernels or working with low-level GPU programming; deep knowledge in a research area (evidenced by publications, public code, or strong coursework); broad understanding across ML subfields and ability to connect ideas; experience building interactive environments, simulations, or complex software systems.