Job Description:
Toast is a leading platform provider for the restaurant industry, driven by innovation and a commitment to quality. As part of our Site Reliability Engineering (SRE) team, you will play a critical role in ensuring the reliability and uptime of our platform.
The SRE team is responsible for overseeing Toast production services, with a focus on quality, reliability, and low latency. To achieve this goal, the team uses reliability best practices, develops and evangelizes patterns and best practices, consults with teams to improve product scalability, observability, security, and reliability, and participates in outage response and root cause analysis for critical systems and infrastructure incidents.
As a Manager of the SRE team, you will provide technical leadership and hands-on code contributions, incorporating reliability best practices for programming and scripting, observability, production triage, incident resolution, and retrospective/root cause analysis to maintain the world-class reliability and uptime of our platform.
Responsibilities:
* Enable a geographically distributed team of talented engineers to continue performing at a high level and help increase the impact of their work.
* Drive day-to-day operations of the team and contribute to the development and prioritization of the SRE roadmap for major initiatives.
* Create and drive strategic organization-wide scalability, observability, and reliability initiatives in collaboration with technical leadership and Product Management.
* Influence architecture decisions for your team and for individual services to optimize resilience and scalability.
* Guide teams to build and maintain systems that are reliable and available for customers.
* Facilitate professional growth by mentoring engineers on your team.
Requirements:
* Hands-on experience managing an SRE team, including hiring, mentoring, cross-functional collaboration.
* Hands-on coding experience with Kotlin, Go, Python, Java/JVM.
* Background in leading complex engineering projects in a Scrum environment.
* Experience in building and running distributed systems.
* Exposure to networking, cloud architectures, and patterns.
* Deep understanding of systems, networking, and scaling issues.
* Direct exposure to cloud infrastructure and SaaS solutions.
Benefits:
We strive to provide competitive compensation and benefits programs that help attract, retain, and motivate the best and brightest people in our industry. Our total rewards package goes beyond great earnings potential and provides the means to a healthy lifestyle with the flexibility to meet changing needs.
Others:
At Toast, we value diversity, equity, and inclusion and strive to create equitable opportunities for all. We believe in fostering a culture of connection as we work together to empower the restaurant community.