Senior Site Reliability Engineer
This is a highly sought-after role that requires expertise in managing complex systems and ensuring their high availability, scalability, and performance.
About the Role:
* Design, build, and maintain scalable, secure, and measurable infrastructure with code.
* Analyze and improve the efficiency, scalability, and reliability of our backend systems.
* Build and mature automation tools for robust continuous integration and deployment pipelines.
* Facilitate capacity planning.
Key Responsibilities:
1. Develop and implement strategies to improve system reliability and uptime.
2. Leverage expertise in cloud computing, DevOps, and SRE practices to optimize system performance.
3. Collaborate with cross-functional teams to ensure seamless delivery of software applications.
Required Skills and Qualifications:
* Expertise in Kubernetes (EKS), CI/CD tools (e.g., ArgoCD, GitHub Actions), and observability platforms (e.g., Datadog).
* Proficiency in automating platform deployment and maintenance tasks (e.g., cluster upgrades, CI/CD workflows).
* Familiarity with integrating tools like Terraform, Elasticsearch, Kafka, Cassandra, and Databricks into the broader platform.
* Knowledge of scaling, failover, and platform reliability best practices.
* Cross-team collaboration skills.
Benefits:
* Pension contribution.
* Flexible work environment.
About Us:
We are a company dedicated to solving complex technology challenges. We offer a dynamic work environment with opportunities for growth and development.