Are you applying to the internship?
Job Description
Research Scientist Intern – Multimodal Sensing & On-Device Perception | ByteDance
The Tone:
This is a PhD Research Internship at ByteDance, a company focused on inspiring creativity and enriching life through innovative products. The role is situated within the PICO Lab and the eye tracking system architecture team. This position is critical for enabling next-generation intelligent hardware to connect AI agents with the physical world and end users, specifically for compact wearable devices. The intern will contribute to research that overcomes conventional visual perception limits by integrating sensing and computing, driving advancements in AI Agent interaction.
The TL;DR
• Role: Internship
• Type: Temporary
• Location: In-person, location unspecified (referencing Los Angeles County for legal disclosures)
• Pay: $55 hourly
• Team: PICO Lab, part of the eye tracking system architecture team
• Mission: To develop next-generation low-power sensing and interaction technologies for compact wearable devices.
• Tech Stack: Optics, image sensors, machine learning models, computer vision, language modeling, imaging systems, hardware/software co-design, structured light components, signal processing
What You’ll Actually Do
• Research: Research, prototype, and evaluate novel image sensor-based sensing approaches specifically for wearable form factors, prioritizing low-power, always-on operation.
• Model Design: Design and train machine learning models, using computer vision and language modeling to interpret complex spatio-temporal data effectively.
• Performance Assessment: Design and conduct simulations to evaluate the performance of developed systems across various representative operating conditions.
• Co-design Exploration: Explore hardware/software co-design opportunities, including the implementation of on-sensor or near-sensor compute solutions to achieve specified power targets.
• Prototype Development: Build end-to-end hardware prototypes by integrating image sensors, optics, and structured light components.
The Must-Haves
• Background: Currently pursuing a PhD in Computer Science, Electrical Engineering, Optical Engineering, Applied Mathematics, Physics, or a related technical field.
• Experience: Strong research background in computer vision and machine learning, with practical experience in model training. Experience with at least one of: sequence modeling, language modeling, efficient neural network design, or signal processing.
• Skills: Computer vision, machine learning, model training, efficient neural network design, sequence modeling, language modeling, signal processing.
• Bonus: Proven track record of high-impact research, demonstrated by publications in top-tier conferences like CVPR, ICCV, ECCV, NeurIPS, ICLR, or SIGGRAPH. Hands-on experience with hardware prototyping involving cameras, structured light, or other active sensing systems. Familiarity with on-device or on-sensor compute, embedded ML deployment, and hardware/software co-design tradeoffs.