Reliability engineering is a vital discipline that focuses on ensuring the stability and performance of complex systems. We're seeking an experienced Reliability Engineer to join our team.
This role involves collaborating with cross-functional teams, including engineering, product management, support, sales engineers, and solution architects, to shape the future of our data platform's reliability.
The ideal candidate will have expertise in designing, implementing, and operating highly reliable systems, as well as strong analytical and problem-solving skills. They should be able to work effectively in a collaborative environment, communicate technical ideas clearly, and drive improvements through metrics-driven decision-making.
Key Responsibilities
* Design, implement, and operate scalable and reliable infrastructure components, such as Kubernetes clusters, containerization solutions, and CI/CD pipelines.
* Collaborate with engineering teams to identify, prioritize, and resolve reliability-related issues, driving improvements through root cause analysis, data-driven decision-making, and automation.
* Develop and maintain monitoring and observability tools, dashboards, and alerts to ensure timely identification and resolution of incidents.
* Partner with security teams to ensure robustness against potential security threats and vulnerabilities, using best practices and industry-standard frameworks.
Requirements
* 5+ years of experience in reliability engineering or a related field, preferably with expertise in cloud-native technologies, DevOps, and automation.
* Proficiency in programming languages such as Go, Java, TypeScript, Python, and shell scripting.
* Familiarity with infrastructure-as-code tools like Terraform, Pulumi, and Ansible.
* Strong understanding of cloud providers (AWS, Azure, Google Cloud Platform), including managed services, containerization platforms, and CI/CD pipelines.
Benefits
* A competitive salary and benefits package.
* A dynamic and supportive work environment, emphasizing collaboration, innovation, and continuous learning.
* Opportunities for professional growth and development, including training, mentorship, and career advancement.
What We Offer
Our hybrid work model offers a balance of remote flexibility and in-person collaboration, allowing you to work from anywhere while staying connected with your team.
As a member of our Reliability Engineering team, you'll contribute to shaping the future of our data platform's reliability, working closely with talented individuals who share your passion for building high-quality systems.