Are you applying to the internship?
Job Description
Research Scientist Intern (TikTok-Privacy Innovation Lab-GPU Systems & Model Optimization) | TikTok
The Tone:
This is a PhD internship at TikTok, a leading destination for short-form mobile video. TikTok’s mission is to inspire creativity and bring joy through its global products. This role matters as it directly contributes to exploring the next frontier of privacy technology and theory, ensuring user privacy is a top priority in the design and implementation of next-generation generative foundation models. Interns actively contribute to products, research, and emerging technologies that shape the future of a privacy-friendly digital experience.
The TL;DR
• Role: Internship (PhD)
• Location: Flexible (Remote/In-person options available)
• Pay: $60 hourly
• Team: Privacy Innovation (PI) Lab
• Mission: Design and optimize GPU systems and models for privacy-preserving, large-scale generative foundation models.
• Tech Stack: Triton, CUDA, CUTLASS, PyTorch, XLA, Nsight, nvprof, nsys
What You’ll Actually Do
• Design and implement high-performance GPU kernels for core components such as Transformer, Attention, MoE, and Diffusion models.
• Perform end-to-end optimization for large model training workloads, focusing on efficiency and performance.
• Conduct in-depth analysis of GPU execution bottlenecks, including compute, memory, and scheduling issues.
• Use and extend Triton, CUDA, and CUTLASS, integrating optimized kernels with PyTorch, XLA, or custom runtimes.
• Collaborate closely with model research teams to translate new model architectures into efficient, production-ready implementations.
The Must-Haves
• Background: Currently pursuing a PhD in Computer Science, Computer Engineering, or a related technical discipline.
• Experience: Solid understanding of GPU architecture and execution models; strong familiarity with Transformer / Attention computation patterns and performance bottlenecks.
• Skills: Proficiency in CUDA C++ or Triton, with the ability to independently write and optimize kernels; ability to read, reproduce, and reason about systems papers or open-source implementations.
• Bonus: Hands-on experience with large-scale model training; familiarity with PyTorch internals (e.g., Autograd, dispatcher, ATen); experience with kernel profiling and performance tuning (e.g., Nsight, nvprof, nsys); publications, open-source contributions, or performance benchmark results.