PhD Student Intern – Multimedia Technologies Research

May 31, 2026
$57 / hour

Are you applying to the internship?

Job Description

Research Intern (Video Data Compression and Application) – 2026 Start (PhD) | ByteDance

The Tone:
This is a PhD internship at ByteDance, a company founded in 2012 with a mission to inspire creativity and enrich life through products like TikTok, Lemon8, CapCut, and Pico. The Multimedia Lab explores cutting-edge technologies, participates in international standardization, and provides software and hardware solutions for multimedia content generation, analysis, and innovative interaction. This role is crucial for advancing the future of multimedia by designing and optimizing algorithms for video data compression and large multimodal model applications. Interns gain hands-on experience and contribute actively to products, research, and emerging technologies.

The TL;DR
• Role: Internship
• Type: Temporary
• Location: Not explicitly stated, with potential for in-person opportunities
• Pay: $57 hourly
• Team: Multimedia Lab
• Mission: Design, develop, and optimize innovative algorithms for data compression, processing, and large multimodal model applications.
• Tech Stack: Python, PyTorch, C/C++, TensorFlow, YOLO, CUDA

What You’ll Actually Do
• Design: Design, develop, and optimize innovative algorithms for data compression, processing, and large multimodal model applications, including 2D video, multiview video, point clouds, Gaussian splatting–based or NN-based coding, token compression, and KV cache optimization.
• Research: Stay current with state-of-the-art techniques through standardization activities or leading conference and journal publications.
• Prototype: Build prototypes and demonstrations of new technologies and algorithms.
• Contribute: Contribute to technical reports, publications, and patent filings based on research and development work.

The Must-Haves
• Background: Current Ph.D. student in computer science, electrical engineering, mathematics, statistics, data science, or related disciplines.
• Experience: Familiarity with token/image/video coding and processing or large multimodal models, with strong capabilities in large multimodal model applications.
• Skills: Strong Computer Science fundamentals including algorithms, data structures, and software design; problem-solving skills; familiar with Python, PyTorch, and C/C++; collaborative mindset; solid written and verbal communication skills.
• Bonus: Good understanding of state-of-art compression algorithms; rich experience and interest in video/image coding standards (e.g., H.263/264/265/266, MPEG-2/4, JPEG, JPEG 2000, AV1, AV2, AVS1/2/3); proficient in deep learning frameworks such as PyTorch, TensorFlow, and YOLO; hands-on experience with large language models (LLMs) and vision-language models (VLMs), including LoRA, diffusion models, and VQA tasks; experience with model training, fine-tuning, and evaluation pipelines; familiarity with GPU acceleration and CUDA environment configuration.