PhD Intern – Sensor-Compute Co-design

June 21, 2026
$55 / hour

Are you applying to the internship?

Job Description

Research Scientist Intern – Multimodal Sensing & On-Device Perception – Global Frontier Tech Recruitment Program – 2027 Start (PHD) | ByteDance

The Tone:
This is a PhD internship at ByteDance, located in the US. ByteDance builds innovative products that help people express themselves, discover, and connect. This role is crucial for advancing multimodal sensing and on-device perception, which are vital for enabling AI Agents to effectively interact with the physical world and understand user intent. The internship provides an opportunity to contribute to products, research, and emerging technologies by rethinking visual perception systems from the ground up.

The TL;DR
• Role: PhD Internship
• Location: US-based
• Pay: $55 hourly
• Team: The eye tracking system architecture team works on the vertical stack of eye tracking systems architecture and related key components.
• Mission: Enable AI Agents to maintain round-the-clock environmental awareness and efficiently capture real-time user intentions, improving their everyday service experience.

What You’ll Actually Do
• Architecture Design: Design and prototype novel sensor or imaging architectures that move computation closer to the sensing front-end.
• Pipeline Characterization: Build and characterize imaging pipelines end-to-end, from optical/sensor physics through ISP to downstream perception models.
• Model Alignment: Utilize understanding of VLM/LLM and world models to inform what information the sensing front-end must preserve, discard, or transform.
• Hardware-Software Co-optimization: Develop or adapt machine vision models that are co-optimized with hardware constraints such as power, bandwidth, and latency.

The Must-Haves
• Background: Currently pursuing a PhD in Computer Science, Electrical Engineering, Optical Engineering, Applied Mathematics, Physics, or a related technical field.
• Experience: Strong research background in computer vision and machine learning, with hands-on model training experience. Experience with at least one of: sequence modeling, language modeling, efficient neural network design, or signal processing.
• Skills: Computer vision, machine learning, model training, sequence modeling, language modeling, efficient neural network design, signal processing.
• Bonus: Proven track record of high-impact research publications in top-tier conferences (e.g., CVPR, ICCV, ECCV, NeurIPS, ICLR, SIGGRAPH). Hands-on experience with hardware prototyping involving cameras, structured light, or other active sensing systems. Familiarity with on-device or on-sensor compute, embedded machine learning deployment, and hardware/software co-design tradeoffs.