PhD Intern, Integrated Sensing & Computing for Visual Perception – Sensor-Computation Co-Design

June 26, 2026
$55 / hour

Are you applying to the internship?

Job Description

Research Scientist Intern – Multimodal Sensing & On-Device Perception – Global Frontier Tech Recruitment Program – 2027 Start (PHD) | ByteDance

The Tone:
This is an internship at ByteDance, a company committed to inspiring creativity and enriching life through innovative products. This specific role is with the eye tracking system architecture team, which focuses on developing high-performance, low-power eye tracking systems with broad population coverage, integrating cutting-edge technologies across optics, image sensors, and computer vision. The internship offers a unique opportunity to contribute to research that will enable AI Agents to maintain round-the-clock environmental awareness and efficiently capture real-time user intentions. By reimagining visual perception systems, this work aims to overcome limitations in conventional sensing, unlocking the next generation of intelligent hardware terminals that better connect AI, users, and daily life.

The TL;DR
• Role: Internship
• Location: In-person, San Jose, CA
• Pay: $55 hourly
• Team: Eye tracking system architecture team
• Mission: Rethink visual perception from the sensor up, fusing sensing and computation to achieve order-of-magnitude gains in perception efficiency.

What You’ll Actually Do
• Design: Design and prototype novel sensor or imaging architectures that bring computation closer to the sensing front-end, exploring techniques like near-sensor processing, event-driven capture, and learned pixel-level compression.
• Build: Build and comprehensively characterize imaging pipelines from optical and sensor physics through the ISP to downstream perception models, identifying areas of wasted data and optimal points for injecting intelligence.
• Inform: Leverage a deep understanding of Vision-Language Models (VLM), Large Language Models (LLM), and world models to guide what information the sensing front-end must preserve, discard, or transform, thereby closing the loop between foundation model requirements and hardware design.
• Develop: Develop or adapt machine vision models specifically co-optimized with critical hardware constraints such as power consumption, bandwidth limitations, and latency requirements.

The Must-Haves
• Background: Currently pursuing a PhD in Computer Science, Electrical Engineering, Optical Engineering, Applied Mathematics, Physics, or a related technical field.
• Experience: Strong research background in computer vision and machine learning, coupled with hands-on model training experience. Candidates should have experience with at least one of sequence modeling, language modeling, efficient neural network design, or signal processing.
• Skills: Computer vision, machine learning, model training, sequence modeling, language modeling, efficient neural network design, signal processing.
• Bonus: Proven track record of high-impact research demonstrated by publications in top-tier conferences (CVPR, ICCV, ECCV, NeurIPS, ICLR, SIGGRAPH), hands-on experience with hardware prototyping involving cameras or active sensing systems, or familiarity with on-device/on-sensor compute, embedded machine learning deployment, and hardware/software co-design tradeoffs.