Who you are
* Hands-on experience managing or leading an Internal tools team, including hiring, mentoring, cross-functional collaboration
* Hands-on coding experience with multiple coding languages - Java/JVM required + one or more of Kotlin, Go, Python, etc
* Background in leading complex engineering projects in a Scrum environment
* Experience in building and running distributed systems
* Exposure to networking, cloud architectures, and patterns
* Deep understanding of systems, networking, and scaling issues
* Direct exposure to cloud infrastructure and SaaS solutions
What the job involves
* The Site Reliability Engineering team at Toast is responsible for overseeing Toast production services, with a commitment to quality, reliability, and low latency — without needing heroics. The team accomplishes this goal by:
* Building tooling to automate, monitor, and manage deployed services using reliability best practices
* Developing and evangelizing patterns and best practices to improve the scalability, observability, and reliability of all Toast systems
* Consulting with teams to improve product scalability, observability, security, and reliability
* Participating in outage response and root cause analysis for critical systems and infrastructure incidents
* As a Manager of the Site Reliability Engineering Tooling team, you will provide technical leadership and hands-on code contributions, incorporating reliability best practices for programming and scripting, observability, production triage, incident resolution, and retrospective/root cause analysis to maintain the world-class reliability and uptime of our platform
* Enable a geographically distributed team of talented engineers to continue performing at a high level and help increase the impact of their work
* Drive day-to-day operations of the team and contribute to the development and prioritization of the SRE roadmap for major initiatives
* Create and drive strategic organization-wide scalability, observability, and reliability initiatives in collaboration with technical leadership and Product Management
* Influence architecture decisions for your team and for individual services to optimize resilience and scalability
* Guide teams to build and maintain systems that are reliable and available for Toast customers
* Facilitate professional growth by mentoring engineers on your team
Benefits
* Peer and company recognition programs
* Unlimited Vacation
* Sabbatical opportunity after five years
* Professional Development Reimbursement Program
* Commitment to Employee Wellness through resources such as a quarterly Wellness Stipend
* Various peer and company recognition programs
* 401(k) and matching
* Medical, Dental, & Vision Coverage
* Mental Health Benefits
* Subsidized backup childcare