Jobs
My ads
My job alerts
Sign in
Find a job Employers
Find

Lead, site reliability engineering

Dublin
Mastercard
Engineering
Posted: 9 April
Offer description

Title and Summary
Lead, Site Reliability Engineering
Site Reliability Engineer (SRE) – Generalist
Role Summary
The Site Reliability Engineer (SRE) – Generalist is a senior level engineer and cross stack reliability expert who proactively ensures system stability, performance, and operational resilience by deeply understanding application behavior and how it manifests across infrastructure.
This role emphasizes anticipation over reaction. While the SRE Generalist participates in incident response, their primary value is in converting operational signals, incidents, and patterns into preventative actions—improving observability, reducing risk, and eliminating classes of failure before they impact customers. They partner closely with application, platform, and infrastructure teams to continuously reduce mean time to detect (MTTD), mean time to resolve (MTTR), and overall incident frequency through data driven insight, automation, and engineering rigor.
Key Responsibilities
Proactive Reliability Engineering

Anticipate reliability risks by analysing application behaviour, system signals, and historical incidents to identify failure patterns and systemic weaknesses before they result in outages.
Translate deep application knowledge into reliability requirements, architectural guidance, and infrastructure improvements that prevent incidents rather than simply respond to them.
Continuously assess system health, resiliency gaps and operational debt, driving improvements that increase service robustness over time.

Incident Response as an Input to Prevention

Participate in and lead troubleshooting efforts during high severity and cross‑domain incidents, applying structured, data‑driven investigation techniques.
Use incidents as learning opportunities—perform root‑cause analysis that focuses on why systems allowed failure, not just what broke.
Ensure incident outcomes result in concrete, measurable improvements such as better instrumentation, safer defaults, automation or architectural changes.

Observability, Monitoring & Signal Quality

Proactively design and evolve observability strategies by onboarding new data sources and improving signal quality across logs, metrics, traces and events.
Build dashboards, alerts and monitors that surface early indicators of degradation, not just failure states.
Apply analytical techniques to detect emerging trends, weak signals and anomalous behaviour before customers are impacted.
Communicate insights through clear data storytelling that enables engineering teams and leaders to act decisively and early.

Automation & Continuous Improvement

Lead automation efforts that reduce manual intervention, shorten feedback loops and eliminate repetitive operational work.
Convert operational learnings into reusable tools, standards, documentation and patterns that raise the reliability baseline across teams.
Actively reduce operational toil and risk by improving system defaults, guardrails and self‑healing capabilities.

Collaboration, Influence & Mentorship

Partner across application, infrastructure and platform teams to drive shared ownership of reliability outcomes and proactive operational thinking.
Influence design and delivery decisions by representing the reliability perspective early in the development lifecycle.
Mentor engineers by modelling proactive troubleshooting, systems thinking and data‑driven decision making.

Knowledge, Skills & Abilities

Strong ability to reason about systems end to end, connecting application behaviour to infrastructure performance and failure modes.
Expertise in observability, monitoring and troubleshooting tools, with a focus on signal quality and actionable insight.
Proficiency in scripting and automation to operationalise reliability improvements and accelerate learning.
Broad infrastructure knowledge (networking, Linux, databases, containers, storage) with depth in at least one domain.
Strong data analysis and storytelling skills, enabling proactive identification of risks and clear communication of technical insights.
Working knowledge of machine learning concepts and their application to predictive and proactive operational problem‑solving.
Curiosity, ownership and a mindset oriented toward preventing tomorrow’s incidents, not just fixing today’s.

What Defines Success in This Role

Sees incidents as signals, not endpoints.
Uses observability and data to shift reliability work left and upstream.
Reduces incident frequency and impact over time—not just MTTR.
Acts as a connective force across teams, turning complexity into clarity and prevention.

Corporate Security Responsibility

Abide by Mastercard’s security policies and practices.
Ensure the confidentiality and integrity of the information being accessed.
Report any suspected information security violation or breach.
Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.

#J-18808-Ljbffr

Apply
Create an E-mail Alert
Job alert activated
Saved
Save
Similar job
Csv engineer
Dublin
Dabster Systems UK Limited
Engineer
£300 - £400 a day
Similar job
Senior mechanical estimator
Dublin
Permanent
WR HVAC
Mechanical estimator
£77,691 - £94,956 a year
Similar job
Service engineer (electrical)
Dublin
Permanent
Rise Technical Recruitment
Service engineer
£38,845 - £42,298 a year
Similar jobs
Engineering jobs in Dublin
jobs Dublin
jobs County Dublin
jobs Leinster
Home > Jobs > Engineering jobs > Engineering jobs > Engineering jobs in Dublin > Lead, Site Reliability Engineering

About Jobijoba

  • Company Reviews

Search for jobs

  • Jobs by Job Title
  • Jobs by Industry
  • Jobs by Company
  • Jobs by Location

Contact / Partnership

  • Contact
  • Publish your job offers on Jobijoba

Legal notice - Terms of Service - Privacy Policy - Manage my cookies - Accessibility: Not compliant

© 2026 Jobijoba - All Rights Reserved

Apply
Create an E-mail Alert
Job alert activated
Saved
Save