Manager, Site Reliability Engineering
Datavant is a data collaboration platform trusted for healthcare, with a mission to make the world's health data secure, accessible, and actionable.
We build the future of how data is connected and used to improve health.
Location: Galway, Ireland.
This full-time, regular role focuses on leading a Site Reliability Engineering (SRE) team that spans people and technical leadership, driving reliability, scalability, and operational excellence across our cloud-native platform.
What You Will Do:
Lead an eSRE team as both the people and technical leader through cross-functional endeavors touching every aspect of the SDLC and PDLC to modernize, secure, and standardize operations.
Partner directly with product engineering teams to improve reliability, scalability, and operational maturity against organization platform and security standards.
Guide teams in designing resilient architectures, defining SLIs/SLOs, and managing error budgets.
Lead troubleshooting of complex production issues and support incident response, root-cause analysis, and post-mortem processes.
Automate infrastructure and operational workflows to reduce toil and accelerate delivery.
Enhance observability through improved logging, metrics, tracing, and performance monitoring.
Develop and deploy reusable tools, templates, and platform components that streamline engineering workflows.
Influence and establish reliability-focused best practices across the organization.
Serve as a technical advisor to engineering teams, helping them adopt sound architectural and operational patterns.
What You Need to Succeed:
Mastery in at least one programming language (e.g., Python, Go, etc.).
Experience operating cloud-native applications on platforms such as AWS, GCP, or Azure.
Expert knowledge of Kubernetes and container orchestration principles.
Expert knowledge of Terraform.
Hands-on experience with observability systems (metrics, logging, tracing) and diagnosing distributed systems issues.
Demonstrated ability to lead or support incident response and drive high-quality post-mortems.
Strong collaboration skills with the ability to influence without authority and partner effectively with developers.
Clear communication skills for explaining reliability tradeoffs to both technical and non-technical audiences.
A mindset grounded in systems thinking, operational excellence, and continuous improvement.
Curiosity, adaptability, and the judgment to balance innovation with pragmatic engineering choices.
Equal Employment Opportunity
We are proud to be an Equal Employment Opportunity employer.
Datavant is committed to working with and providing reasonable accommodations to individuals with physical and mental disabilities.
If you need an accommodation while seeking employment, please request it by selecting the "Interview Accommodation Request" category.
You will need your requisition ID when submitting your request.
Requests for reasonable accommodations will be reviewed on a case-by-case basis.
#J-*****-Ljbffr