About the Job
Site Reliability Engineering (SRE) is a field that combines software and systems engineering to build and run large-scale, massively distributed systems. This work involves designing, deploying, operating and refining services to ensure they meet customer needs.
SRE teams focus on optimizing existing systems, building infrastructure and automating processes to eliminate waste. They also support services before launch through activities like system design consulting, capacity planning and review.
Maintaining services once live requires measuring and monitoring availability, latency and overall system health. SREs must scale systems sustainably through automation and evolve them by pushing for changes that improve reliability and velocity.
* Key Responsibilities:
* Engage in service lifecycle management-from inception to deployment and operation.
* Support pre-launch activities such as system design consultation, software development, capacity planning and reviews.
* Monitor and maintain services post-launch by measuring availability, latency and overall system health.
* Scale systems sustainably using automation and push for improvements in reliability and velocity.
This role demands collaboration, problem-solving and communication skills. The right candidate will be able to analyze complex issues, develop creative solutions and implement them effectively.