Reliability Engineering Lead
To excel in this role, you will lead the design and evolution of observability, monitoring, and alerting systems for end-to-end visibility and proactive issue detection. This entails implementing scalable automation frameworks for infrastructure provisioning, deployment pipelines, and operational tasks.
About the Role
This is a senior position that requires experience in application reliability, availability, and performance. You will ensure minimal downtime and optimize response times by owning incident management processes, including high-severity incident response, root cause analysis, and continuous improvement initiatives.
* Lead capacity planning and performance optimization efforts across distributed systems and cloud-native environments.
* Champion disaster recovery and business continuity planning, ensuring readiness for large-scale events.
* Mentor colleagues, fostering a culture of ownership, resilience, and operational excellence.
Responsibilities
This position involves working with architecture, security, and product leadership to align reliability goals with business objectives. Collaboration is key, and you will participate in on-call rotations and provide 24/7 support for critical incidents. Your expertise will be invaluable in helping the team achieve its goals.
About Us
We empower our talented people to collaborate, innovate, and drive growth. Our team is dynamic, and we work on challenging and relevant issues in financial services and technology. If you are passionate about reliability engineering and want to contribute to a forward-thinking organization, apply now!