Are you applying to the internship?
Job Description
SRE Metrics Analyst Intern
About the Role
This intern role is pivotal for ensuring the Site Reliability Engineering (SRE) team possesses the necessary data and insights to effectively maintain and enhance system reliability. You will be instrumental in designing, implementing, and managing metrics collection, focusing on system performance, reliability, and incident analysis. You will develop and maintain reporting frameworks to provide actionable insights to stakeholders, driving improvements in our systems and processes. Your contributions will directly support data-driven decision-making and continuous improvement initiatives across our systems and processes, aligning with the organization’s commitment to delivering high-quality, reliable services.
Work Arrangement & Location
This is a hybrid role, requiring 50% telework. Candidates must be local to one of the following cities:
- Norfolk, VA
- Jacksonville, FL
- Bremerton, WA
- San Diego, CA
Key Responsibilities
Metrics Collection Framework:
- Design and implement a comprehensive metrics collection framework that captures key performance indicators (KPIs) related to system reliability and operational efficiency.
- Identify relevant metrics and establish methods for collecting, aggregating, and storing data from various sources, including monitoring tools, logs, and databases.
Data Analysis and Visualization:
- Analyze collected metrics to identify trends, patterns, and anomalies that impact system reliability and performance.
- Develop dashboards and visualizations to present data in a clear and actionable manner using tools such as Grafana, Kibana, or Tableau.
- Ensure that stakeholders have access to real-time insights and reports that inform decision-making.
Reporting:
- Create regular reports on system performance, reliability, incident response times, and other critical metrics for various stakeholders, including technical teams and management.
- Provide insights and recommendations based on data analysis to drive continuous improvement initiatives.
- Prepare and present findings to stakeholders, facilitating discussions on reliability goals and performance enhancements.
Collaboration with SRE Teams:
- Work closely with SRE teams to identify their metric needs and ensure alignment with operational goals.
- Collaborate with engineering and operations teams to ensure that metric collection is integrated into development and deployment processes.
- Support incident response efforts by providing metrics that help identify root causes and areas for improvement.
Continuous Improvement:
- Stay current with industry trends and best practices related to metrics collection, monitoring, and reporting within SRE and DevOps.
- Continuously evaluate and enhance the metrics collection and reporting processes to improve data accuracy, relevance, and accessibility.
- Foster a culture of data-driven decision-making within the SRE team and broader organization.
Key Qualifications
Education & Clearance:
- Currently enrolled in a degree program (e.g., Computer Science, Engineering, Data Science, or related major).
- GPA of 3.0 or better.
- U.S. Citizenship required.
- Ability to obtain and maintain a DoD security clearance.
Experience:
- Experience in metrics collection, data analysis, or reporting, preferably within a Site Reliability Engineering (SRE) or DevOps environment.
- Proven experience in working with monitoring and observability tools (e.g., Prometheus, Datadog, New Relic).
Technical Skills:
- Strong understanding of key metrics used in site reliability engineering, including Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Service Level Agreements (SLAs).
- Proficiency in data analysis tools and languages (e.g., SQL, Python, R) for data manipulation and reporting.
- Experience with data visualization tools (e.g., Grafana, Kibana, Tableau) to create dashboards and reports.
Analytical Skills:
- Strong analytical and problem-solving skills, with the ability to interpret complex data sets and provide actionable insights.
- Ability to evaluate the relevance and accuracy of metrics and make recommendations for improvement.
Communication and Collaboration:
- Excellent communication skills, both written and verbal, with the ability to present data and findings to technical and non-technical audiences.
- Proven ability to work collaboratively with cross-functional teams and build strong relationships with stakeholders.
Preferred Qualifications
- Experience with cloud platforms (AWS, Google Cloud Platform, Azure) and their monitoring tools.
- Familiarity with incident management processes and practices within an SRE context.
- Knowledge of software development methodologies and best practices.
Key Metrics of Success (How You’ll Be Measured)
- Timely and accurate collection of key performance metrics with minimal data discrepancies.
- Effective visualization and reporting of metrics that inform decision-making and drive improvements in reliability.
- Positive feedback from stakeholders regarding the clarity and usefulness of reports and insights.
- Continuous improvement in the SRE metrics collection and reporting processes, leading to better operational performance.
Why Join Us?
Be part of a dynamic and innovative team focused on enhancing the reliability and performance of critical systems. Play a key role in shaping the metrics strategy that drives operational excellence and continuous improvement. Work in an environment that values collaboration, professional development, and a commitment to quality. Contribute to the success of the organization by providing actionable insights that improve system reliability and performance.
At Leidos, we outthink, outbuild, and outpace the status quo – because the mission demands it. We’re not hiring followers. We’re recruiting the ones who disrupt, provoke, and refuse to fail. Step 10 is ancient history. We’re already at step 30 – and moving faster than anyone else dares.
Administrative Details
Original Posting Date: December 26, 2025
Anticipated Close Date: No earlier than December 29, 2025 (3 days after the original posting date).
Pay Range (General Guideline): $48,100.00 – $86,950.00
Note: The Leidos pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law.