Senior Site Reliability Engineer
The role of Senior Site Reliability Engineer is to work on Red Hat OpenShift, enterprise Kubernetes, as part of a global team. You will contribute to the design and development of automation software to provision, upgrade, monitor, and heal a large global fleet of OpenShift clusters deployed across multiple public clouds.
Responsibilities
* Design and write automation software to provision, upgrade, monitor, and heal a large global fleet of OpenShift clusters deployed across multiple public clouds
* Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions
* Participate in release cycles of our offerings, deploying code to integration, staging, and production environments, integrating with CI/CD tooling, monitoring, and change management
* Perform software updates, peer code reviews, testing, and CVE analysis; respond to security threats
* Interact with automated monitoring and healing infrastructure to ensure healthy environments
* Provide engineering support to resolve customer issues
* Create and maintain standard operating procedures (SOPs) for maintenance tasks, applying configuration changes, and remediating problems