Reliability Expert Wanted
A leading software firm is seeking a skilled Reliability Expert to support the growth of their complex SaaS platform hosted in Microsoft Azure.
1. Maintain high availability, reliability, and performance of services in the Azure environment.
2. Implement robust monitoring, alerting, and observability tooling to ensure system health.
3. Automate provisioning, deployment, scaling, and incident response processes.
4. Lead and participate in incident management and post-incident reviews to enhance overall system resilience.
5. Develop Infrastructure as Code using tools like ARM, Bicep, or Terraform to streamline infrastructure management.
6. Optimize system performance and capacity through data-driven approaches to meet business demands.
7. Safeguard system security and compliance by adhering to industry standards (ISO27001, SOC 2, GDPR).
The ideal candidate will have experience working with SaaS platforms, proficiency in Microsoft Azure infrastructure and services, strong scripting skills, and familiarity with monitoring platforms such as Azure Monitor, Grafana, Prometheus, or Datadog.