Inference Architecture Interns

Engineering

San Jose, CA

June 8, 2026

Internship

Apply Now

Are you applying to the internship?

Job Description

Inference Intern | Etched

The Tone:
This is an internship at Etched, located in San Jose, CA. Etched is building the world’s first AI inference system purpose-built for transformers, aiming to deliver significantly higher performance and lower costs compared to existing solutions. This role is crucial for developing and optimizing compute architectures that achieve exceptional performance and efficiency for transformer workloads. Interns will contribute to the design of next-generation AI accelerators, working on cutting-edge architectural problems and performance modeling.

The TL;DR
• Role: Internship
• Type: Temporary
• Location: In-person, San Jose, CA

• Mission: Develop and optimize compute architectures that deliver exceptional performance and efficiency for transformer workloads.
• Tech Stack: Python, C++, Linux internals, accelerator architectures (GPUs, TPUs), Compilers, high-speed interconnects (NVLink, InfiniBand), vLLM, SGLang, Rust, PyTorch, JAX

What You’ll Actually Do
• Model Porting: Support porting state-of-the-art models to the architecture and help build programming abstractions and high-performance software components for rapid iteration.
• Runtime Development: Assist in building, enhancing, and scaling Sohu’s runtime, including multi-node inference, intra-node execution, state management, and robust error handling.
• Communication Optimization: Contribute to optimizing routing and communication layers using Sohu’s collectives.
• Performance Analysis: Utilize performance profiling and debugging tools to identify bottlenecks and correctness issues.
• Architecture Co-design: Develop a deep understanding of Sohu to co-design both hardware instructions and model architecture operations to maximize model performance.

The Must-Haves
• Background: Student progressing towards a Bachelor’s, Master’s, or PhD degree in computer science, computer engineering, applied mathematics, or a related field.
• Experience: Understanding of performance-sensitive or complex distributed software systems, such as Linux internals, accelerator architectures (e.g., GPUs, TPUs), Compilers, or high-speed interconnects (e.g., NVLink, InfiniBand), coupled with experience porting applications to non-standard accelerator hardware or platforms. Deep knowledge of transformer model architectures and/or inference serving stacks like vLLM or SGLang is also required.
• Skills: Proficiency in Python and C++.
• Bonus: Proficiency in Rust, experience with low-latency and high-performance applications using kernel-level and user-space networking stacks, a deep understanding of distributed systems concepts, solid grasp of Transformer architectures (especially Mixture-of-Experts), experience building applications with extensive SIMD optimizations, familiarity with PyTorch or JAX, or participation in math competitions.

Date Posted

June 8, 2026
Location

San Jose, CA
Expiration date

--
Gender

Neutral
Qualification

Bachelor Degree
Career Level

Student

AI Resume Builder

LinkedIn Optimizer

AI Cover Letter Trending

AI Mock Interview Trending

EzApply Chrome Extension

AI Pitch Generator New

Inference Architecture Interns

Are you applying to the internship?

Job Description

Related Jobs

PhD Intern – Machine Learning, Ads Monetization

Technology and Engineer Fellow – Data Center Infrastructure

Multimedia AI Intern – LLM Post-training

Software Engineering Intern, Machine Learning, Lead Ads Team (2026) – Machine Learning

Products

For Candidates

Company

Welcome to Internexxus

Reset Password

Welcome to Internexxus

Inference Architecture Interns

Are you applying to the internship?

Job Description

Share this post

Related Jobs

Login to Internexxus

Reset Password

Create a free Internexxus account

Products

For Candidates

Company