Job Title: Site Reliability Engineer (AWS)Location: Dublin / Hybrid (2 days)Type: PermanentWe’re looking for a Site Reliability Engineer (Mid-Level) who loves solving complex problems, automating everything possible, and keeping systems running smoothly. You’ll be right in the mix of building reliable, scalable, and high-performing cloud environments — mainly in AWS — while helping our dev teams ship great products faster and with fewer headaches.This is a hands‑on role for someone who values automation, ownership, and collaboration over silos and manual fixes. You’ll also join our on‑call rotation (don’t worry, it’s shared and well‑supported).What you’ll be doingBuild and manage robust, highly available AWS infrastructure using tools like Terraform or CloudFormationMaintain and improve CI/CD pipelines (Azure DevOps) for automated deployments and testingWork with Docker and Kubernetes (EKS/ECS) to orchestrate containerized workloadsAutomate as much as possible — from monitoring and alerting to deployment workflowsDefine and track reliability metrics (SLIs/SLOs/error budgets)Dive into incidents, lead root cause analysis, and make sure they don’t happen againBuild out observability solutions (CloudWatch, Prometheus, Grafana, ELK, etc.)Partner with development and security teams to improve app reliability and platform performanceKeep security tight — IAM roles, secrets management, and network boundaries are second natureWhat makes you a great fitAround 5–7 years in IT, with 3+ years in SRE, DevOps, or Cloud EngineeringDeep hands‑on experience with AWS (EC2, VPCs, IAM, S3, RDS, CloudWatch, ALB/ELB, Route53)Solid experience building CI/CD pipelines with Azure DevOpsComfortable managing Linux and/or Windows environments at scaleStrong background in Docker and Kubernetes — you know your way around clusters, scaling, and deploymentsSkilled with Infrastructure as Code (Terraform, CloudFormation)Confident scripting in Bash or Python for automating all the boring stuffExperienced in monitoring, logging, and alerting — you believe in metrics, not guessworkUnderstand the core of SRE: SLIs/SLOs, incident management, postmortems, capacity planningAlways exploring ways to make cloud systems faster, more resilient, and more cost‑efficientWho you areObsessed with uptime, reliability, and automationOpen communicator who thrives in cross‑team collaborationTakes full ownership of what you build and runAlways curious about the latest in SRE and cloud‑native tech
#J-18808-Ljbffr