Technical Leadership Opportunities
We are seeking a highly motivated Technical Leader to drive operational efficiency across our cloud infrastructure.
This role will involve tackling intrinsically hard problems, venturing beyond comfortable approaches when necessary. You will learn, educate, and advocate, acquiring expertise as needed, pioneering new spaces, and inspiring others as to what's possible.
A Day in the Life
* You'll balance your time between operating production systems and making long-term improvements to the reliability, availability, and performance of those software systems.
* An example week could look like: Monday you provide meaningful feedback on the most critical upcoming change whilst guiding senior technical talent in your organization to make more decisions without you.
* Tuesday you identify a major reliability risk in the interplay between systems in your care and design a cohesive solution.
* Wednesday you lead the design review with relevant technical leaders, receiving consensus on a path forward.
* Thursday, you influence senior management to take goals and make investments to achieve that outcome.
* Friday, you begin developing part of that system which would have the most impact on the reliability of the overall system.
About the Role
This is an internally focused and highly visible role, demanding continuous learning, collaboration across departments within Amazon, and it will significantly impact the quality of life for both current and future customers and builders who directly or indirectly depend on our European Sovereign Cloud.
Responsibilities
* Experience operating and troubleshooting reliable, scalable software systems.
* Able to troubleshoot at all levels, from network to operating systems to software applications.
* Proficient communicator across languages, cultures, and time zones.
* Able to periodically travel to meet with internal engineering teams, leaders, and customers.
Requirements
* 10+ years of experience in software development or related field.
* Highly Proficient in operating 24x7 high-availability, distributed software applications.
* Strong understanding of network fundamentals (DNS, DHCP, TCP/IP, routing, load balancing, load shedding).
* Proficient with Infrastructure as Code, (such as CDK, CloudFormation, Puppet, Chef, Ansible, or similar).
* Proficient with monitoring frameworks (such as CloudWatch, Datadog, Grafana, Elastic or similar).
* Experience scripting operating system tasks in Bash, Python, etc.