PhD Intern – Generative AI Models

May 25, 2026
$60 / hour

Are you applying to the internship?

Job Description

Research Scientist Intern (Privacy Innovation Lab-Multimodal Generative Model) – 2026 Start (PhD) | TikTok

The Tone:
This is a PhD internship at TikTok, located in Los Angeles, CA. TikTok is the leading global destination for short-form mobile video, focused on inspiring creativity and bringing joy. This role is crucial for advancing privacy-friendly technology innovation within multimodal generative models, ensuring user trust while enabling cutting-edge AI development. The Privacy Innovation Lab aims to explore the next frontier of privacy technology and theory, contributing key insights and technical solutions for all TikTok products.

The TL;DR
• Role: Internship
• Type: Temporary, Research
• Location: In-person, Los Angeles, CA
• Pay: $60 hourly
• Team: Privacy Innovation (PI) Lab, Multimodal Generative Model focus
• Mission: Build next-generation generative foundation models with a strong focus on diffusion-based and unified generation-understanding architectures within privacy-sensitive production environments.
• Tech Stack: PyTorch, GPU-first architecture, DiT, Flow Matching, Rectified Flow

What You’ll Actually Do
• Model Development: Design and implement improvements for Diffusion Transformer (DiT/MM-DiT) architectures and unified text-to-image/video models, including latent space, tokenization, and conditioning mechanisms.
• Algorithmic Optimization: Perform joint algorithmic and system-level optimization to enhance training stability, convergence speed, memory/compute efficiency, and generation quality and consistency.
• Advanced Generation: Address complex challenges in long-sequence, high-resolution, and video generation by developing efficient attention, temporal modeling, and long-context/latent strategies.
• Collaborative Research: Work with systems and kernel engineers to translate model designs into efficient implementations and reproduce, analyze, and advance state-of-the-art generative models.

The Must-Haves
• Background: Doctorate Degree. Student. Currently pursuing a PhD in Computer Science, Computer Engineering, or a related technical discipline.
• Experience: Hands-on experience training large-scale models.
• Skills: Deep understanding of Diffusion, Flow Matching, and Rectified Flow; strong familiarity with DiT and Transformer-based architectures in generative modeling; proficiency with PyTorch; ability to debug the full pipeline from mathematical formulation to generated outputs.
• Bonus: Practical experience developing non-toy text-to-image or text-to-video models; familiarity with multimodal modeling (Text, Image, Video, Audio); a record of research publications or open-source contributions.