Compute Platform SRE Intern

June 21, 2026
$45 / hour

Are you applying to the internship?

Job Description

Site Reliability Engineer Intern (Compute Platform) – 2026 Summer (BS/MS) | TikTok

The Tone:
This is an internship at TikTok, available in various US cities, including Los Angeles, CA. TikTok is the leading global destination for short-form mobile video, dedicated to inspiring creativity and bringing joy to its users. This role is crucial for ensuring the stability and performance of TikTok’s entire Big Data ecosystem, directly impacting the reliability of major data warehouse products and query engines across the company. As part of a newly established team, you will contribute significantly to shaping its future and upholding critical service level agreements.

The TL;DR
• Role: Internship
• Type: Full-time, Temporary
• Location: In-person, various US cities, including Los Angeles, CA
• Pay: $45 hourly
• Team: Compute Platform SRE team, supporting Big Data services and products
• Mission: Ensure the reliability of TikTok’s major data warehouse products, services, and query engines.
• Tech Stack: ClickHouse, Spark, Presto, Doris, Hadoop, Kubernetes, Linux, Python, Shell, Java, Go

What You’ll Actually Do
• Reliability Ownership: Be responsible for the reliability of TikTok’s key data warehouse products, services, and query engines, which include technologies like ClickHouse, Spark, Presto, and Doris.
• SLA Enforcement: Uphold Service Level Agreements for TikTok’s Data Platform services, promptly addressing and resolving any system outages or issues to meet service level objectives.
• Performance Optimization: Analyze service performance patterns to identify potential bottlenecks and implement proactive measures to prevent disruptions, collaborating with development teams to ensure efficient resource utilization.
• Incident Management: Lead troubleshooting efforts and coordinate with cross-functional teams to resolve service incidents and conduct postmortems, mitigating service-impacting events effectively.
• Infrastructure Automation: Automate processes for infrastructure provisioning, scaling, and management to reduce manual interventions and continuously improve overall service quality.
• Collaboration and Planning: Engage with product and development teams to integrate reliability and performance considerations throughout the software lifecycle, while also assessing and forecasting infrastructure needs based on growth.

The Must-Haves
• Background: Currently pursuing an Undergraduate or Master’s degree in Software Development, Computer Science, Computer Engineering, or a related technical discipline.
• Experience: Able to commit to working for 12 weeks during Summer 2026, with experience or familiarity in open-source or commercial technologies such as ClickHouse, Hadoop, Doris, Spark, Presto, and Kubernetes.
• Skills: Possess an in-depth understanding of Linux, computer networking, and databases, along with proficiency in common SRE/DevOps open-source toolsets, system monitoring tools, and container orchestration platforms like Kubernetes. Strong coding skills in at least one scripting or programming language such as Python, Shell, Java, or Go are also required.
• Bonus: Exhibit excellent problem-solving skills, the ability to think critically under pressure, a strong customer-first mindset, and a sense of ownership coupled with collaborative abilities.