Senior Site Reliability Engineer - Azure Red Hat OpenShift (Ireland, Italy, Portugal, Czech Republic)
Red Hat Waterford, County Waterford, Ireland
Overview
The Red Hat OpenShift Dedicated Site Reliability Engineering (SRE) team is looking for a Senior Software Engineer to join our global team. In this role, you will work on Red Hat OpenShift, enterprise Kubernetes, as part of a team that develops and operates Red Hat OpenShift Dedicated, a public cloud service based on Red Hat OpenShift for large enterprise customers. You will contribute to the design and development of automation software to provision, upgrade, monitor, and heal a large global fleet of OpenShift clusters deployed across multiple public clouds. You will participate in a global on-call rotation and help lead incident management, root cause analysis, and continuous improvement activities, managing engineering efforts against an SLA and error budget.
Responsibilities
* Design and write automation software to provision, upgrade, monitor, and heal a large global fleet of OpenShift clusters deployed across multiple public clouds
* Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions
* Participate in release cycles of our offerings, deploying code to integration, staging, and production environments, integrating with CI/CD tooling, monitoring, and change management
* Perform software updates, peer code reviews, testing, and CVE analysis; respond to security threats
* Interact with automated monitoring and healing infrastructure to ensure healthy environments
* Provide engineering support to Red Hat's global technical support team to resolve customer issues
* Create and maintain standard operating procedures (SOPs) for maintenance tasks, applying configuration changes, and remediating problems
* Participate in a global on-call rotation, including periodic weekend and holiday on-call duties
What you will bring
* 3+ years of software engineering experience using object-oriented languages; Golang and Python preferred
* Experience managing Linux-based systems in a public cloud (AWS, GCP, or Microsoft Azure)
* Commercial experience with enterprise system monitoring; knowledge of Prometheus is a plus
* Experience with container technology, Kubernetes, OpenShift, and configuration management tools (Red Hat Ansible Automation, Puppet, or Chef) is a big plus
* Demonstrated ability to troubleshoot systems issues quickly and accurately
* Solid written and verbal communication skills in English
About Red Hat
Red Hat is the world’s leading provider of enterprise open source software solutions, delivering Linux, cloud, container, and Kubernetes technologies with a community-powered approach. We support flexible work environments and encourage employees to contribute ideas regardless of title or tenure. We are committed to open collaboration and inclusion.
Inclusion at Red Hat
Our culture is based on transparency, collaboration, and inclusion, empowering people from diverse backgrounds to share ideas and drive innovation. We strive for equal opportunity and welcome applicants from all backgrounds.
Equal Opportunity Policy (EEO)
Red Hat is an equal opportunity workplace and an affirmative action employer. We review applications without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, disability, medical condition, marital status, or other legally protected characteristics.
Red Hat does not seek or accept unsolicited resumes or CVs from recruitment agencies. We are not responsible for, and will not pay, any fees related to unsolicited resumes or CVs except as required in a contract.
Red Hat supports individuals with disabilities and provides reasonable accommodations to job applicants. If you need assistance completing our online job application, email application-assistance@redhat.com.
#J-18808-Ljbffr