About the Role
Site Reliability Engineering combines software and systems expertise to develop and operate large-scale, distributed systems.
The goal is to ensure that Google Cloud's services have reliability, uptime suitable for customer needs, and a rapid improvement pace.
Much of our development focuses on optimizing existing systems, building infrastructure, and automating tasks.
* Create and deploy new products and features in data centers and WAN networks safely.
* Collaborate with teams to enhance Google's Software Defined Networking stack.
* Drive efforts to reduce mean time to detect (MTTD) and mitigate (MTTM) incidents.