Job Opportunity
We are seeking a highly skilled Senior Site Reliability Engineer to join our team. The role is critical in ensuring the reliability, scalability, and performance of our cloud-hosted platform.
The successful candidate will be responsible for designing, implementing, and maintaining highly available and secure systems on Microsoft Azure.
Key tasks include developing automation and Infrastructure as Code (IaC) using PowerShell, ARM templates, Bicep, or Terraform, configuring and maintaining monitoring, alerting, and observability tools such as Azure Monitor, Grafana, Prometheus, or Datadog, and leading incident and problem management, including root cause analysis and post-incident reviews.
* Design, implement, and maintain highly available and secure systems on Microsoft Azure.
* Develop automation and Infrastructure as Code (IaC) using PowerShell, ARM templates, Bicep, or Terraform.
* Configure and maintain monitoring, alerting, and observability tools such as Azure Monitor, Grafana, Prometheus, or Datadog.
* Lead incident and problem management, including root cause analysis and post-incident reviews.
The ideal candidate will have proven experience as a Site Reliability Engineer or similar in a cloud-based SaaS environment, strong expertise with Microsoft Azure (compute, networking, storage, monitoring), hands-on experience with automation, PowerShell scripting, and IaC tools, working knowledge of Docker, Kubernetes, and cloud-native architectures, experience with monitoring and observability solutions, excellent analytical, communication, and collaboration skills.