Dublin, County Dublin, Ireland Software and Services
Add to Favourites Engineering Manager - SRE
Description
The Apple Services Engineering Cloud Services SRE organization is looking for a strong, hands-on leader. The leader will lead a platform focused SRE team, and be responsible for the reliability of the platform. The platform serves workloads that provide our organisation and our customers with their favourite applications, services, and tools.We are domain experts in fleet management, systems, and software engineering. We build automations, instrument reliability tools, and respond to alerts and incidents which may pose a risk to the reliability of the platform. Team’s focus is on infrastructure capabilities and processes, improving the reliability and efficiency of the systems, at scale.Responsibilities include:* Act as the Service Owner, designing and mapping key performance indicators to achieve the organization’s mission* Lead the definition of requirements, priorities and planning of engineering deliverables* Implement structured engineering and operations processes* Lead the team in daily agile SRE practices, ensuring proper team focus on priorities, achievements, and deliverables* Optimise velocity and efficiency of delivery, and drive continuous improvementSuccess depends on strong understanding of SRE principles and practices, combined with a track record of resolving issues in a live production environment, and implementing strategies to minimize them while driving clear action plans for the team.The successful candidate will be highly self-motivated with a passion for excellence, quality, and detail. As a leader, they are responsible for coaching and mentoring their team members, helping them achieve service goals, and build career paths in alignment. It’s imperative for the leader to empower their team by providing appropriate context and timely feedback.The leader will not only own the service, but will also collaborate with other teams within Apple. They will build trust with stakeholders and partner through diplomacy, discussion, and follow-through. This is a broad cross-organisation role with high-visibility, collaborating with multiple teams. They are expected to invest in and build good relations with key partners. Their collaboration with internal customers, product engineering, and development groups is critical to success.
Minimum Qualifications
* Experience in critical, large scale distributed systems experience, combining Hardware, Operating Systems and Software
* Experience building and leading engineering teams; ideally SRE or Production Engineering
* Strong emphasis on SRE as an engineering subject area, with proficiency in at least in one of the following languages (Golang, Rust, Python, Swift)
* Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements
* Superb interpersonal skills, capable of working with multi-functional technical and business teams and varying levels of management, influencing decision making
* Bachelors or Masters in Computer Science, Computer Engineering, or equivalent experience.
Preferred Qualifications
* Working with large bare-metal infrastructure and release management.
* Experience with large scale server provisioning, fleet management and maintenance
* Experience with development within Kubernetes ecosystem, including operator framework, controllers and CRDs
* Hardware bootstrap and associated security (PXE, BIOS, TPM, secure boot, trusted computing)
* Automating operations processes via services and tools
* Configuration management and fleet orchestration via Puppet, Chef, Ansible, or others
Add to Favourites Engineering Manager - SRE #J-18808-Ljbffr