Sr. site reliability operations engineer

Dublin

Salesforce, Inc..

Operations engineer

Posted: 15h ago

Offer description

Digital Enterprise Technology (DET) connects people and technology to transform the future of work at Salesforce. Guided by our core values of Trust, Customer Success, Equality, Innovation, and Sustainability, we deliver business outcomes that fuel growth, drive competitive advantage, and empower our employees and customers globally. DET's scope stretches beyond traditional IT. We are strategic partners, advocating for the best outcomes for our customers, always innovating, and helping to shape the future of work. DET oversees technology strategy, Salesforce on Salesforce, customer and partner enablement, applications engineering, infrastructure, collaboration, enterprise operations, architecture, and program enablement. DET is Customer Zero, the best example of Salesforce products delivered globally, at scale, sustainably.
As a Sr. Site Reliability Operations Engineer you'll be part of our internal DET Site Reliability Operations team supporting our employees globally. This role combines incident command, reliability engineering, and hands-on technical support. You'll lead response efforts for critical incidents while working with teams across different time zones.
Responsibilities

Lead incident response for high severity incidents affecting internal business operations. Serve as Incident Commander to coordinate technical teams, establish impact, and drive rapid service restoration.
Monitor and troubleshoot enterprise systems including infrastructure, applications, and network components. Use your technical skills to diagnose complex problems across multiple platforms and vendors before they impact users.
Prepare executive summaries and communicate incident status to leadership up to CDO level. Translate technical details into business language for stakeholders during and after incidents.
Drive improvements to incident management processes by updating playbooks, creating SOPs, and leading automation initiatives. Mentor junior team members on handling escalated technical issues.
Coordinate emergency changes and infrastructure updates to resolve incidents. Work with cross-functional teams to maintain business continuity during critical situations.
Analyze incident data and KPI metrics to identify trends. Develop actionable recommendations to reduce impact duration and improve team performance.
Participate in on-call rotation as part of regional coverage. Lead incident review meetings and ensure accurate documentation for post-incident analysis.

Required Experience

8+ years in IT operations, incident management, or site reliability work. Proven experience in a 24x7 high availability environment with enterprise systems.
Demonstrated ability to lead high severity incident response under pressure. Establish impact, evaluate solutions with subject matter experts, and make decisions that balance technical and business needs.
Excellent verbal and written communication skills for technical and executive audiences. Create clear incident updates, status reports, and executive summaries for leadership.
Strong technical troubleshooting ability across Windows and Linux servers, networking, cloud platforms, and virtualization technologies. Diagnose problems quickly using logs, monitoring tools, and common diagnostic approaches.
Experience leading or mentoring technical teams in incident response or operations roles.
Experience with cloud platforms like AWS and monitoring of IT infrastructure. Comfortable with core cloud concepts and various monitoring tools.
Sugg est and design SLIs/SLOs to ensure reliability and performance of critical systems in alignment with SRE best practices
ITILv4 certification and deep understanding of incident, problem, and change management processes.
Industry certifications from public cloud platforms like AWS/Azure/Google, CCNA, RHCE or Microsoft associate
BS in Computer Science or equivalent practical experience. What matters most is proven ability to solve complex problems and lead technical response efforts

Nice to Have

Salesforce platform experience and certifications.
Additional advanced certifications like AWS SA, CCNP, RHCA
Scripting ability in Python, Bash, PowerShell, or similar languages. Experience leading automation initiatives to reduce manual work.
Advanced experience with monitoring and visualization tools like Splunk, Grafana, or Tableau. Proven ability to analyze data and present insights.
Experience with automation tools like Puppet or Chef for infrastructure management.

#J-18808-Ljbffr

Apply

Create an E-mail Alert

Save

Similar job

Sr. site reliability operations engineer

Dublin

Salesforce, Inc.

Operations engineer

Similar job

Data operations engineer

Dublin

Valsoft Corp

Operations engineer

Similar job

Data operations engineer

Dublin

Valsoft Corporation

Operations engineer

€60,000 - €100,000 a year