Are you applying to the internship?
Job Description
About TikTok
TikTok is the leading destination for short-form mobile video. Its mission is to inspire creativity and bring joy. TikTok’s global headquarters are in Los Angeles and Singapore, with offices in New York City, London, Dublin, Paris, Berlin, Dubai, Jakarta, Seoul, and Tokyo.
Job Description
This internship is with the TikTok Flink Ecosystem Team, which plays a critical role in delivering real-time computing capabilities to power TikTok’s massive-scale recommendation, search, and advertising systems. The team focuses on building the infrastructure for stream processing at exabyte scale — enabling ultra-low-latency, high-reliability, and cost-efficient real-time data transformations. They develop and optimize Apache Flink and surrounding components to meet TikTok’s rapidly evolving data needs. The team also collaborates closely with ML infrastructure teams to bridge real-time stream processing and machine learning, integrating Velox to accelerate model training, building multimodal data pipelines, and utilizing frameworks like Ray to orchestrate large-scale distributed ML workflows.
Responsibilities:
• Design and develop core Flink operators, connectors, or runtime modules to support TikTok’s exabyte-scale real-time processing needs.
• Build and maintain low-latency, high-throughput streaming pipelines powering online learning, recommendation, and ranking systems.
• Collaborate with ML engineers to design end-to-end real-time ML pipelines, enabling efficient feature generation, training data streaming, and online inference.
• Leverage Velox for compute-optimized ML data transformation and training acceleration on multimodal datasets (e.g., video, audio, and text).
• Use Ray to coordinate distributed machine learning workflows and integrate real-time feature pipelines with ML model training/inference.
• Optimize Flink job performance, diagnose bottlenecks, and deliver scalable solutions across EB-scale streaming workloads.
Minimum Qualifications:
• Currently pursuing a Bachelor’s degree or higher in Computer Science, Software Engineering, Data Engineering, or a related technical field.
• Strong programming skills in Java, Scala, or Python.
• Understanding of distributed systems, stream processing, and event-driven architecture.
• Familiar with system design concepts such as fault tolerance, backpressure, and horizontal scalability.
• Demonstrated ability to debug and analyze complex distributed jobs in production environments.
Preferred Qualifications:
• Graduating in December 2025 or later, with the intent to return to your academic program.
• Experience with Apache Flink, Spark Streaming, or Kafka Streams.
• Hands-on experience with Ray for distributed ML or workflow orchestration.
• Familiarity with Velox, Arrow, or similar columnar execution engines for training/feature pipelines.
• Understanding of multimodal data processing (e.g., combining video, audio, and text in model training pipelines).
• Experience working with data lake ecosystems (e.g., Iceberg, Hudi, Delta Lake) and cloud-native storage at PB–EB scale.
• Contributions to open-source projects or participation in ML/engineering hackathons or competitions.