Senior Software Engineer - Site Reliability
This is a challenging role for an experienced software engineer to join our team as a Senior Software Engineer in Site Reliability. You will be responsible for designing, analyzing, and troubleshooting large-scale distributed systems, with a strong focus on reliability and uptime.
We are looking for someone with 5 years of experience in software development, data structures or algorithms, and at least 3 years of experience in leading projects and providing technical leadership. A Bachelor's degree in Computer Science or a related field is also required.
The ideal candidate will have excellent systematic problem-solving skills, coupled with effective verbal and written communication skills. In addition, they should be able to debug, optimize code, and automate routine tasks.
About the Role
Site Reliability Engineering combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. As an SRE, you will ensure that our services have reliability, uptime appropriate to customer needs, and a fast rate of improvement. You will also keep an eye on our systems' capacity and performance, and work towards optimizing existing systems, building infrastructure, and eliminating work through automation.
Responsibilities
- Engage in the whole lifecycle of services, from inception and design, through to deployment, operation, and refinement.
- Support services before they go live by activities such as system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems.