Machine Learning GPU Performance Engineer

July 23, 2024

Are you applying to the internship?

Job Description

About Google

Google’s software engineers are at the forefront of developing next-generation technologies that impact billions of users globally. We go beyond web search, handling information at massive scale across various areas like information retrieval, distributed computing, large-scale system design, networking, data storage, security, artificial intelligence, natural language processing, UI design, and mobile.

Our engineers are versatile, display leadership qualities, and are enthusiastic about tackling new problems across the full stack as we push the boundaries of technology.

The Core Team

The Core team builds the technical foundation behind Google’s flagship products. We own and advocate for the underlying design elements, developer platforms, product components, and infrastructure at Google. These are essential building blocks for providing excellent, safe, and coherent experiences for our users while driving the pace of innovation for every developer. We look across Google’s products to build central solutions, break down technical barriers, and strengthen existing systems. As the Core team, we have a unique opportunity to impact important technical decisions across the company.

Machine Learning GPU Performance Engineer Responsibilities

This role is focused on optimizing the performance of large language models (LLMs) and associated products on Google’s GPU infrastructure. Key responsibilities include:

Benchmarking: Identify and maintain representative benchmarks for LLM training and serving across Google production, industry, and the ML community. Leverage these benchmarks to identify performance opportunities and drive XLA:GPU/Triton performance toward state-of-the-art levels, guiding XLA releases.
Performance Optimization: Collaborate with Google product teams like DeepMind to solve their ML model performance challenges. Onboard new LLM models and products on GPU hardware, enabling LLMs to train and serve efficiently at massive scale (thousands of GPUs).
Architecture Simulation: Run architecture-level simulations on GPU designs and perform roofline analysis to guide internal teams.
Performance Evaluation: Run performance benchmarks on GPU hardware using internal and external tools.
Performance Analysis: Analyze performance and efficiency metrics to identify bottlenecks. Design and implement solutions at Google fleetwide scale.