Compute Platform SRE Intern – Big Data SRE

June 21, 2026
$43 / hour

Are you applying to the internship?

Job Description

Site Reliability Engineer Intern (Compute Platform) – 2026 Summer (BS/MS) | TikTok

The Tone:
This is a temporary US-based internship at TikTok, a global company located in Los Angeles, Singapore, and other major cities, that is the leading destination for short-form mobile video. The company’s mission is to inspire creativity and bring joy through its innovative products. This role is crucial as it supports all Big Data services and products across the company, ensuring the reliability of major data warehouse products, services, and query engines that serve business needs across various domains within TikTok.

The TL;DR
• Role: Internship
• Type: Temporary
• Location: US-based
• Pay: $42.75 hourly
• Team: Compute Platform SRE team, a newly established team supporting Big Data services and products.
• Mission: Ensure the reliability of TikTok’s major data warehouse products, services, and query engines, and meet all service level objectives.
• Tech Stack: ClickHouse, Spark, Presto, Doris, Hadoop, Kubernetes, Python, Shell, Java, Go, Linux, computer networking, databases, SRE/DevOps open-source toolsets, system monitoring tools.

What You’ll Actually Do
• Ensure the reliability of TikTok’s major data warehouse products, services, and query engines, including technologies like ClickHouse, Spark, Presto, and Doris.
• Uphold Service Level Agreements (SLAs) by meeting all service level objectives from ByteDance’s Data Platform services and promptly responding to any system outages or issues.
• Optimize performance by continuously analyzing service and reliability patterns to identify bottlenecks and implement proactive measures, collaborating with development teams to ensure efficient resource utilization.
• Manage incidents by leading troubleshooting and resolution efforts for service incidents and postmortems, coordinating with cross-functional teams to mitigate service-impacting events.
• Automate infrastructure provisioning, scaling, and management processes to minimize manual intervention and enhance overall service quality.

The Must-Haves
• Background: Currently pursuing an Undergraduate or Master’s degree in Software Development, Computer Science, Computer Engineering, or a related technical discipline, and able to commit to working for 12 weeks during Summer 2026.
• Experience: Familiarity with open-source or commercial technologies such as ClickHouse, Hadoop, Doris, Spark, Presto, and Kubernetes.
• Skills: In-depth understanding of Linux, computer networking, and databases; proficiency in common SRE/DevOps open-source toolsets, system monitoring tools, and container orchestration platforms like Kubernetes; strong coding skills in at least one scripting or programming language, including Python, Shell, Java, or Go.
• Bonus: Excellent problem-solving skills and the ability to think critically under pressure, combined with a strong customer-first mindset, a sense of ownership, and a collaborative spirit.