Job Overview:
* Maintaining and optimizing cloud infrastructure for reliability and scalability requires expertise in building and maintaining CI/CD pipelines and deployment automation.
* Monitoring, alerting, and observability for production systems are essential to ensure system performance, availability, and maintainability.
* Containerized workloads and orchestration must be managed effectively to drive business efficiency.
* Troubleshooting production issues and leading incident response efforts require strong collaboration with development teams.
Key Responsibilities:
* Maintain and optimize cloud infrastructure to ensure high availability and scalability.
* Develop and maintain CI/CD pipelines and deployment automation scripts.
* Design and implement monitoring, alerting, and observability systems for production environments.
* Manage containerized workloads and orchestrate deployments using tools like Kubernetes.
* Collaborate with cross-functional teams to identify and address technical debt and improve overall system performance.
* Participate in on-call rotations to respond to production incidents and lead root cause analysis efforts.
Requirements:
* Bachelor's degree in Computer Science or related field.
* Minimum 3 years of experience in SRE or related field.
* Strong understanding of cloud infrastructure, CI/CD pipelines, and deployment automation.
* Experience with containerization, orchestration, and monitoring tools.
* Excellent communication and collaboration skills.
* Ability to work in a fast-paced environment and adapt to changing priorities.
Benefits:
* Opportunity to work on complex technical challenges and contribute to the growth of the organization.
* Collaborative and dynamic work environment.
* Competitive salary and benefits package.
* Ongoing training and professional development opportunities.