Site Reliability Expert
A reputable software organization is seeking an experienced Site Reliability Engineer to fill a critical role in Dublin. As a Site Reliability Engineer, you will be responsible for ensuring high availability, reliability, and performance of cloud-hosted services.
* Ensure high availability, reliability, and performance of cloud-hosted services
* Build and manage robust monitoring, alerting, and observability tooling
* Automate provisioning, deployment, scaling, and incident response
* Lead and participate in incident management and post-incident reviews
* Implement Infrastructure as Code using tools like ARM, Bicep, or Terraform
* Maintain strong security and compliance alignment (e.g., ISO27001, SOC 2)
Key Skills Required:
* Experience in a SaaS or software product environment
* Proven background in Microsoft Azure infrastructure and services
* Strong scripting/automation skills (e.g., PowerShell)
* Familiar with monitoring platforms such as Grafana, Prometheus, or Datadog
* Knowledge of containerization (Docker/Kubernetes) and CI/CD support tools
* Skilled in incident response, root cause analysis, and system resilience
Nice to Have:
* Azure certifications (e.g., Administrator)
* Experience in the financial sector
* Familiarity with ServiceNow, PagerDuty, or other incident management tools