Reliability Engineer Position
This role is for a skilled professional with strong Azure cloud experience who can design, implement, and maintain highly available systems.
Key Responsibilities:
* System Design and Implementation
Design and implement highly available systems in Azure.
* Monitoring and Performance Improvement
Monitor system health and performance, addressing issues proactively to minimize impact on customers.
* Automation and Scalability
Automate provisioning, deployment, and scaling using Infrastructure as Code (ARM templates, Bicep, Terraform).
* Alerting and Incident Response
Implement smart alerting and incident response processes for timely resolution of incidents.
* Root-Cause Analysis and Continuous Improvement
Lead root-cause analysis and drive continuous improvement following incidents.
* Collaboration and Optimization
Collaborate with DevOps teams to optimize CI/CD pipelines for reliability and efficiency.
* Security and Compliance
Ensure systems meet security and compliance requirements (ISO27001, SOC 2, GDPR).
* Continuous Service Improvement
Drive continuous service improvement to achieve operational excellence.