Rent the Runway is disrupting the trillion-dollar fashion industry and changing the way women get dressed through our shared designer closet.
We power women to feel their best every day through subscription, rental, and resale of hundreds of designer brands.
About Us
Founded in 2009, we have remained committed to our mission: empowering women to look and feel great through a wide assortment of fashion items for every occasion.
About Our Team
Our Platform Engineering team is smart, pragmatic, and entrepreneurial. We are reliability-focused and passionate about making our closet-in-the-cloud a reality for our customers.
About the Job
We're looking for a Senior Site Reliability Engineer (SRE) to spearhead technology initiatives in cloud infrastructure, software delivery, and observability.
Key Responsibilities
* Utilise programming languages like Terraform, Python, Go, and services including Docker and Kubernetes to drive service reliability.
* Implement software development practices to build observability, alerting, tracing, automation, and self-healing capabilities to maintain the highest levels of platform availability.
* End-to-end coordination across platforms, while supporting, identifying, responding, and reporting issues; escalating to respective teams for remediation promptly.
* Develop maintenance and operations automation through CI/CD.
Requirements
* 5 years of hands-on experience with orchestration tools such as Kubernetes and/or Helm.
* Advanced skills in Terraform and Ansible, with a solid understanding of CI/CD tools like GitHub, GitLab, and Artifactory.
* Practical experience with monitoring, alerting, and logging tools, including Splunk and GCP Monitoring.
* At least 3 years of experience in maintaining production environments across cloud platforms like GCP, AWS, or Azure.
* 5+ years of experience in developing and delivering products using programming languages such as Bash, Python, Golang, or Java.
* Proven track record of enhancing existing systems, building robust infrastructure, and automating processes to reduce workload.
* Experience working within Agile teams, adhering to sprint cadences and delivery timelines.
* Ability to effectively triage issues and conduct thorough root-cause analyses when necessary.
* Strong team player with the ability to work collaboratively within diverse groups.
* Capable of driving Site Reliability Engineering practices among development and operations teams.
* Willingness to participate in an on-call rotation, troubleshoot production issues, perform Root Cause Analyses, and share insights with the Engineering and Operations teams.
About Our Benefits
We offer inclusive benefits, including:
* Generous Paid Time Off including annual leave, paid bereavement, and family sick leave.
* Universal Paid Parental Leave for both parents + flexible return to work program.
* Paid Sabbatical after 5 years of continuous service.
* Competitive Stakeholder Pension.
* Comprehensive health, dental care and dependents care from day 1 of employment.
* Company-wide events and outings.
* Hybrid Work - This is a hybrid role based in our Galway, Ireland office. Employees have the option to work remotely 2-3 days per week.
We're an equal opportunity employer, and we prohibit discrimination against any applicant or employee on any legally recognised basis.