Job Opportunity
We are seeking a skilled professional to take on the role of Cloud Reliability Engineer.
The successful candidate will be responsible for maintaining and enhancing the reliability, resilience, and performance of Azure PaaS workloads.
Key responsibilities include designing and evolving monitoring, alerting, and observability approaches to detect issues early, as well as tackling challenging production incidents and delivering permanent fixes.
Apart from this, the candidate will be expected to build automation and Infrastructure as Code tooling (ARM, Bicep, Terraform, PowerShell) to streamline operations.
Work closely with engineering and operations teams to promote continuous service improvement.
Drive performance, capacity, and cost optimisation across the cloud platform.
Requirements:
* Recent (3+ years) hands-on experience with Azure PaaS services such as App Services, Functions, Service Bus, SQL, etc.
* Strong grounding in observability, monitoring, and troubleshooting distributed cloud environments.
* Solid experience with IaC and automation tools (ARM, Bicep, Terraform, PowerShell).
* Familiar with monitoring stacks like Azure Monitor, Application Insights, Prometheus, Grafana, Datadog, or similar.
* Proven capability in incident management, RCA, and reliability engineering.
* Exceptional analytical and diagnostic skills, with the ability to look beyond surface-level symptoms.