Principal Site Reliability Engineer - Azure Red Hat OpenShift, WaterfordRed Hat Limited is seeking a Principal Software Engineer to join our Site Reliability Engineering (SRE) team. In this role, you will develop, scale, and operate OpenShift, Red Hat’s enterprise Kubernetes distribution. Your responsibilities will include contributing to the operation of OpenShift at scale, enabling customer self-service, enhancing monitoring systems, and automating processes.You will have the opportunity to address complex scaling challenges specific to Red Hat Managed Cloud Services, utilizing your skills in coding, operations, and large-scale distributed system design.What will you doManage, deploy, and operate cloud solutions at scale following SRE principlesDesign and develop new features to enable OpenShift as-a-service across multiple cloudsCreate automation software for provisioning, upgrading, monitoring, and healing OpenShift clusters globallyIdentify and resolve high-risk architecture issues to improve resilienceCollaborate with internal teams and the open-source community to contribute to projectsParticipate in product release cycles, deploying code through CI/CD pipelines, and managing changesPerform software updates, peer reviews, testing, and security analysisEnsure environment health through automated monitoring and healing infrastructureSupport Red Hat's global technical support team in resolving customer issuesShare knowledge, mentor peers, and develop team capabilitiesCreate and maintain SOPs for maintenance, configuration, and problem remediationParticipate in a global on-call rotationWhat will you bring5+ years of experience in software engineering with object-oriented languages; Golang preferredExtensive experience managing Linux systems in public clouds like AWS, GCP, or AzureProficiency with enterprise monitoring systems; Prometheus knowledge is a plusExperience with configuration management tools such as Ansible, Puppet, or ChefAt least 1 year of experience with container technologies like Docker or KubernetesUnderstanding of Linux containers and networking protocols like TCP/IP, DNS, HTTPStrong communication skills in EnglishAbout Red HatRed Hat is a leading provider of enterprise open source solutions, fostering a collaborative and inclusive environment. We support flexible work arrangements and encourage innovation and diversity.Diversity, Equity & InclusionOur culture is rooted in open source principles, promoting transparency, collaboration, and inclusion. We value diverse perspectives and strive for an equitable environment where everyone can contribute and thrive.Equal Opportunity PolicyRed Hat is an equal opportunity employer. We do not discriminate based on race, color, religion, sex, sexual orientation, gender identity, national origin, age, veteran status, disability, or other protected categories.
#J-18808-Ljbffr