Are you applying to the internship?
Job Description
IT Administrator Intern with Site Reliability Engineering (SRE) Focus
Introduction:
At IBM, work is more than a job – it’s a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you’ve never thought possible. Are you ready to lead in this new era of technology and solve some of the world’s most challenging problems? If so, let’s talk.
Your Role And Responsibilities:
IBM is seeking a highly motivated and detail-oriented IT Administrator Intern with a keen interest in Site Reliability Engineering (SRE) to join our dynamic team. This internship offers an unparalleled opportunity to gain hands-on experience in enterprise IT systems management while being introduced to cutting-edge modern reliability engineering practices.
You will work alongside experienced professionals, contributing significantly to infrastructure operations and supporting critical automation and performance initiatives. This role is designed to provide practical exposure to maintaining robust, scalable, and highly available IT environments.
Key Responsibilities:
- System Maintenance & Support: Assist in the crucial tasks of maintaining and supporting a diverse range of enterprise IT systems, including physical and virtual servers, various operating systems (Linux, Windows, z/OS, zVM), and evolving cloud environments (IBM Cloud, AWS, Azure). You will learn operational best practices to ensure system health and availability.
- Performance Monitoring & Reliability: Engage in active monitoring of system performance metrics, assisting in the identification of potential reliability issues, bottlenecks, and areas for improvement. This includes learning to interpret data from monitoring tools to proactively address concerns.
- Automation & Scripting: Support and contribute to automation efforts for routine administrative tasks. This involves utilizing scripting languages such as Python and Bash to enhance efficiency, reduce manual overhead, and improve the scalability of our operations.
- Incident Response & Analysis: Participate in incident response activities, gaining insight into how critical issues are identified, triaged, and resolved. You will also contribute to post-incident analysis, helping to uncover root causes and implement preventative measures to improve system resilience.
- Documentation & Knowledge Management: Assist in documenting critical system configurations, developing clear operational procedures, and tracking key reliability metrics. This ensures knowledge transfer, promotes consistency, and supports data-driven decision-making for continuous improvement.
- Cross-functional Collaboration: Collaborate effectively with cross-functional teams, including development, network, and security teams, on infrastructure projects and service delivery initiatives. This fosters a holistic understanding of IT service lifecycles and teamwork.
What You’ll Gain:
This internship is an excellent opportunity to bridge academic knowledge with real-world enterprise IT challenges. You will develop a strong foundation in IT administration, understand the core principles of SRE, enhance your problem-solving skills, and gain invaluable experience working in a leading technology company.
Preferred Education:
- Bachelor’s Degree (While currently pursuing a degree is required, a Bachelor’s is preferred for interns moving into full-time roles).
Required Technical And Professional Expertise:
- Currently pursuing a degree in Computer Science, Information Technology, or a closely related technical field.
- A foundational understanding of various operating systems (e.g., Linux, Unix, Windows, z/OS, zVM) and core networking fundamentals.
- Familiarity and practical experience with common scripting languages such as Python and Bash.
- Demonstrated strong analytical and problem-solving skills, with an ability to approach complex issues logically.
- Excellent effective communication and collaboration abilities, comfortable working within a team-oriented environment.
Preferred Technical And Professional Experience:
- Exposure to and basic understanding of major cloud platforms (e.g., IBM Cloud, AWS, Azure).
- A genuine interest in and understanding of SRE principles, including automation, observability, error budgets, and fault tolerance.
- Familiarity with monitoring tools (e.g., Grafana, Prometheus) for visualizing system performance and health.
- Experience with version control systems (e.g., Git) and an understanding of CI/CD concepts (Continuous Integration/Continuous Delivery).