Cloud Technical Site Reliability Engineer
**Job Description:**
The role of the Cloud Technical Site Reliability Engineer is to contribute to the overall Bank IT digital transformation journey. This includes collaborating with development teams to implement and deploy new features, automating repetitive tasks, monitoring systems, identifying performance bottlenecks, and resolving availability issues.
**Key Responsibilities:**
- Collaborate with development teams to ensure reliability and performance standards are met
- Automate repetitive tasks to improve efficiency and reduce manual effort
- Monitor systems and identify performance bottlenecks, availability issues, or monitoring gaps
- Support the Cloud Centre of Excellence in understanding error budgets for platforms and critical workloads
- Own the Problem Management process within the CCoE, performing proactive problem management and conducting post-incident analyses to identify root causes and implement preventive measures
- Support Change Management and UAM process as required
- Create and maintain documentation such as operational procedures and Technical Operations Manuals
**Requirements:**
- Demonstrated work experience in a similar operational role managing business-critical environments with experience managing public cloud infrastructure (AWS and Azure an advantage)
- Strong knowledge of Linux/Unix systems and command line tools
- Proficiency in scripting languages such as Python and Shell
- Experience with configuration management tools like Ansible or Chef
- Experience with code management and deployment tools such as Bitbucket and Jenkins or native tooling
- Good understanding of networking and communication systems (TCP/IP, HTTP, DNS, etc.) and knowledge of containerization technologies (Docker, Kubernetes) and orchestration tools
**Benefits:**
- Work From Home option available
**About the Team:**
The Infrastructure Operations Team's mission is to enable the Service by driving partnerships in providing the Bank with the Technology required to run the business, securing Customers and Banks data, improving stability and remaining current in order to provide a premier service.