Principal site reliability engineer

Dublin

Hero Recruitment

Site reliability engineer

Posted: 13 April

Offer description

Principal TechOps Engineer – SRE
We are seeking a Principal TechOps Engineer (SRE) to play a key role in designing, building, and operating highly available cloud infrastructure.
This position involves close collaboration with engineering teams to drive initiatives from concept through to production.
You will work within a modern, multi-region Kubernetes environment (AWS EKS) supporting mission-critical workloads, helping to shape infrastructure strategy and improve reliability, scalability, and automation across the platform.
This is a high-impact opportunity to influence cloud architecture, deployment practices, and operational excellence in a fast-paced, collaborative environment.
Key Responsibilities
Partner with engineering teams to deliver infrastructure and platform initiatives end-to-end
Design and operate highly available, secure, and scalable cloud-native systems
Manage and optimize Kubernetes environments (AWS EKS) across multiple regions and availability zones
Lead efforts in infrastructure automation and infrastructure-as-code (IaC)
Build and maintain CI/CD pipelines and deployment frameworks
Define and implement monitoring, logging, and alerting strategies
Drive adoption of DevOps best practices and automation-first mindset
Provide technical leadership and mentorship to SRE / Cloud Engineering teams
Collaborate cross-functionally with product, engineering, and risk stakeholders
Champion reliability, performance, and operational excellence across all systems
Required Skills & Experience
5+ years of hands-on experience with AWS in production environments
Strong experience with Docker and containerized workloads
Proven experience running and managing Kubernetes workloads (preferably AWS EKS)
Experience deploying and managing Kubernetes clusters
Hands-on experience with CI/CD tools (Jenkins preferred)
Experience creating and managing Helm charts and libraries
Strong knowledge of monitoring and observability tools (e.g., CloudWatch, Datadog, Splunk)
Solid experience with UNIX/Linux systems and shell scripting
Experience working in large-scale AWS environments (multi-account, IAM, SSO)
Strong communication skills with the ability to engage across all levels
Ability to work independently and take ownership of initiatives
Preferred Experience
Infrastructure-as-code experience (Terraform preferred)
Programming experience (Python preferred)
Experience with Git or other distributed version control systems
Experience with Kafka / Confluent Kafka
Familiarity with agile methodologies (Kanban preferred)
Experience with CDN providers (e.g., Akamai)
Desirable Traits
Strong automation mindset – sees problems as opportunities to improve processes
Proven leadership experience within SRE / Cloud Engineering teams
Passion for building resilient, scalable systems
Ability to thrive in a fast-moving, evolving environment
Team & Environment
You will join a highly skilled Technical Operations team focused on cloud transformation, reliability engineering, and scalable infrastructure.
The team operates with a strong DevOps culture, emphasizing:
Infrastructure-as-code
Automation and continuous delivery
Security and resilience
High availability and system reliability
#J-*****-Ljbffr

Apply

Create an E-mail Alert

Save

Similar job

Site reliability engineer devops for scalable systems

Dublin

Arista Networks

Site reliability engineer

Similar job

Site reliability engineer

Dublin

Itcontracting

Site reliability engineer

Similar job

Site reliability engineer (kubernetes)

Dublin

Crusoe

Site reliability engineer