Hybrid in Dublin 2
The Major Incident Manager is responsible for leading the end‑to‑end management of high‑severity (P1/P0) technology and security incidents to minimize business impact and restore service as quickly as possible. This role is a key driver for managing the resolution of technical problems with serious consequences to the company or customers. This responsibility includes collaborating and partnering with the entire organization to drive action and foster growth.
Drive incidents to resolution and ensure accurate and timely customer and executive‑level communications through multiple channels.
Ensure the correct resources are working on the resolution of major incidents appropriate to the severity, and identify when escalation is required and trigger such escalation accordingly.
Ensure that incident management processes are followed and that incident post‑mortems are completed to capture process deviations and areas for improvement.
Responsibilities
Drive the company’s Major Incident Management Process for critical customer situations
Coordinate with peer managers worldwide on resources, issues, and schedules
Manage and report ongoing CritSit metrics
Support accurate and consistent maintenance of technical and management escalation processes
Serve as the primary command lead during critical events and customer outages, applying structured incident management frameworks.
Create and maintain recovery playbooks for commonly occurring customer patterns and issues
Coordinate cross‑functional technical teams during single, multi‑customer, business‑critical systems and/or cybersecurity incidents.
Partner with the Security Operations to escalate threats and ensure swift containment.
Activate response procedures for DDoS events, incidents, security flaws and vulnerabilities
Ownership and execution of the active critical incident management process, including:
Event analysis, applying the ITIL framework for severity and impact
Facilitate the resolution effort and determine when it is necessary to engage additional resources if the resolution effort is stalled during the call with stakeholders
Engagement of escalation management resources
Responsible for conducting readiness activities, including incident simulations, playbook development, and responder enablement.
Manages multi‑channel customer and internal communications at an executive level
Timeline documentation and review
Manage event communications through multiple channels
Crafts business appropriate communications for the affected operating groups and manages communication on a major incident conference call.
Establish and manage bridge calls with engineers and customers on a single customer outage
Conduct post‑event analysis, leveraging the ITIL problem management process and relationships with engineering to ensure that issues prevent further occurrences
Incident Resolution handoffs, along with detailed notes and a summary of the business impact and duration, to the Problem Management team
Perform other duties and projects as assigned
Requirements
Minimum of 5 years experience in critical/crisis situation management for technical customer escalations
Bachelor’s degree in business, computer science, engineering, or related field or equivalent experience
Excellent communication skills (both verbal and written)
The ability to communicate confidently and clearly on conference calls, in meetings and via email, at all levels of the organization is essential
Strong organizational skills with the ability to manage multiple tasks simultaneously
Customer focus and ownership, use of own initiative and a proactive approach to work
Crafts business appropriate communications for the affected operating groups and manages communication on a major incident conference call
Extensive experience supporting and managing technical environments; demonstrated leadership skills under fast‑paced, highly dynamic situations
Must be technically literate and be able to articulate technical issues in a meaningful way to both engineers and executive level management
Crisis management skills: able to set priorities, pursue multiple threads at the same time, accurately reflect current state and drive towards desired state
Ability to maintain calm during stressful situations
A team player who is influential and builds good working relationships across all functions.
Excellent project management skills, including demonstrated ability to manage projects across teams where influencing skills are required
Experience or working knowledge with relational databases (e.g. MySQL, Oracle)
Comfortable leveraging AI‑driven tools to accelerate incident detection, correlation, and resolution, ensuring faster restoration of service
Support and promote AI adoption across incident management workflows, enabling teams to operate more proactively and efficiently
Analyze and interpret AI‑powered insights (e.g., predictive risk signals, automated incident summaries, or recommended actions) to drive decision‑making and improve major incident outcomes
#J-18808-Ljbffr