About your skills
* Extensive knowledge of cloud technologies, with AWS experience being highly advantageous.
* Proven expertise in managing containerized workloads using Kubernetes in production environments.
* Proficiency in programming languages such as Python, Golang, or Kotlin.
* Strong familiarity with infrastructure-as-code (IaC) tools like Terraform and Helm.
About this role
As a Staff Site Reliability Engineer at Udemy, you'll play a critical role in managing and evolving our infrastructure, from our CDN to our databases. You'll oversee and improve tools like Helm and Terraform, build development environments that empower our engineering teams, and enhance reliability standards across the organization. Collaborating closely with dev teams, you'll also design internal tools in Python and Golang while responding to incidents and driving best practices in reliability.
What you'll be doing
* Leading projects to enhance and optimize our infrastructure and tooling in collaboration with the SRE team and engineering teams across Udemy.
* Acting as a mentor to other engineers on the SRE team, fostering growth and technical development.
* Championing SRE best practices throughout Udemy's engineering organization.
* Designing and implementing powerful, scalable tools to meet internal customer demands.
* Supporting and maintaining platforms like Kubernetes clusters and CI/CD pipelines.
* Contributing to incident management, identifying root causes, and driving continuous reliability improvements.
* Participating in the on-call rota to support mission-critical systems.
What you'll have
* Hands-on experience managing Kubernetes clusters and cloud environments at scale.
* Solid expertise in deploying infrastructure using infrastructure-as-code tools.
* Strong proficiency in writing tools and applications using languages such as Python, Golang, or Kotlin.
* Proven capability of being part of an on-call rotation and managing incidents effectively.
* A track record of working with diverse engineering teams, providing guidance on best practices for reliability and scalability.
* Excellent communication skills with a collaborative mindset, including the ability to both give and receive feedback constructively.
We understand that not everyone will match each of the above qualifications. However, we also realize that everyone has unique experiences that can add value to our company. Even if you think your background might not perfectly align, we'd love to hear from you
#LI-SO1