Senior Site Reliability Engineer – Udemy
Join to apply for the
Senior Site Reliability Engineer
role at
Udemy
.
Udemy is an AI-powered skills acceleration platform built to help people and teams grow.
It's personalized, practical, and focused on real-world impact.
Our mission is simple: to transform lives through learning.
Your work helps people around the world build skills they can use, whether they're picking up something new or leveling up to stay ahead.
Over 80 million learners and 17,000 businesses already learn with Udemy.
If you're excited by change, energized by learning, and ready to have a real impact, you'll feel right at home.
Where we work
Udemy is a global company headquartered in San Francisco, with additional U.S. offices in Denver and Austin, and international hubs in Australia, India, Ireland, Mexico, and Türkiye.
This is an in-office position, requiring three days a week in the office (Tuesday, Wednesday, Thursday) and flexibility on Mondays and Fridays.
Technical Skills
Extensive knowledge of cloud technologies, with AWS experience highly advantageous.
Proven expertise in managing containerized workloads using Kubernetes in production environments.
Proficiency in programming languages such as Python, Golang, or Kotlin.
Strong familiarity with infrastructure-as-code (IaC) tools like Terraform and Helm.
About This Role
As a Senior Site Reliability Engineer at Udemy, you'll play a critical role in managing and evolving our infrastructure, from our CDN to our databases.
You'll oversee and improve tools like Helm and Terraform, build development environments that empower our engineering teams, and enhance reliability standards across the organization.
Collaborating closely with dev teams, you'll design internal tools in Python and Golang while responding to incidents and driving best practices in reliability.
Responsibilities
Work with the SRE team on projects to enhance and optimize our infrastructure and tooling in collaboration with engineering teams across Udemy.
Champion SRE best practices throughout Udemy's engineering organization.
Design and implement powerful, scalable tools to meet internal customer demands.
Support and maintain platforms like Kubernetes clusters and CI/CD pipelines.
Contribute to incident management, identify root causes, and drive continuous reliability improvements.
Participate in the on-call rota to support mission-critical systems.
Qualifications
Hands-on experience managing Kubernetes clusters and cloud environments at scale.
Solid expertise in deploying infrastructure using infrastructure-as-code tools.
Strong proficiency in writing tools and applications using languages such as Python, Golang, or Kotlin.
Proven capability of being part of an on-call rotation and managing incidents effectively.
A track record of working with diverse engineering teams, providing guidance on best practices for reliability and scalability.
Excellent communication skills with a collaborative mindset, including the ability to give and receive feedback constructively.
Benefits
Our benefits start with you and were built to provide you and your family with the protection and care you need, making it easy to access the right coverage when you need it most.
EEO Statement
At Udemy, we value diversity and inclusion and consider qualified applicants without regard to race, color, religion, sex, national origin, ancestry, age, genetic information, sexual orientation, gender identity, marital or family status, veteran status, medical condition, or disability.
We understand that not everyone will match each of the qualifications.
However, we also realize that everyone has unique experiences that can add value to our company.
Even if you think your background might not perfectly align, we'd love to hear from you
Information regarding data privacy is available within the Udemy Careers Privacy Notice.
#J-*****-Ljbffr