Summary
People at Apple don't just build products — they craft the kind of experience that has revolutionised entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it.
Description
Apple Services Engineering (ASE)’s Security SRE team is seeking an experienced SRE to protect the availability of our most critical Platform Security Services, ensuring that services scale to meet the demands of Apple’s Services offerings. You will work with world‑class engineers on core components of Apple’s Cloud Platform supporting identity access management, foundational security Services, cutting‑edge identity access management and risk/vulnerability services to meet the growing demands of our platform. This hands‑on role is to maintain and enhance SRE practices for platform security services, to accelerate our ability to reliably and consistently support thousands of applications at scale. This role offers unique opportunities to solve problems through AI‑assisted automation, including building and leveraging AI tools to accelerate triage, operational workflows, and infrastructure tooling tailored to SRE workflows and infrastructure operations. The successful candidate will be highly self‑motivated with a passion for excellence, quality, and detail — not only supporting uptime, but working closely with some of the best minds in the industry to aid in the design and implementation of improvements to stability, security, and scalability.
Responsibilities
Operate, monitor, and triage all aspects of our production Security Engineering environments.
Develop and apply AI-powered tooling in a production SRE environment.
Design, build, and implement innovative solutions, navigating orchestration and bare metal provisioning in a highly distributed environment.
Prepare alert handling procedures, runbooks, and collaborate with other SRE teams.
Participate in on‑call rotations to troubleshoot and resolve production issues, minimising downtime.
Actively participate in capacity planning, scale testing, and disaster recovery exercises.
Interact with and support partner teams, including engineering, QA, and program management.
Cultivate and maintain relationships with internal and external third‑party vendors.
Minimum Qualifications
In depth experience in a Site Reliability Engineering or Infrastructure‑focused role.
Expert, in‑depth professional experience with cloud operations, with a focus on infrastructure‑as‑a‑service (compute, storage, and network virtualisation).
Strong experience building and scaling cloud infrastructure and large‑scale distributed systems.
Experience with Kubernetes, KVM/hypervisor technologies and proficient in Go, Python, Java, Rust, and/or Swift.
Experience operating large‑scale multi‑tenant Infrastructure as a Managed Service and able to troubleshoot issues across the entire infrastructure stack.
Preferred Qualifications
Expert-level proficiency with Infrastructure as Code tools such as Chef, Ansible, Terraform, or Puppet.
Familiarity with Linux system virtualisation (Libvirt, QEMU, KVM) and associated APIs.
Ability to implement and coordinate telemetry using monitoring and observability tools such as Splunk, Grafana, and Prometheus.
Familiarity with infrastructure hardening practices, compliance frameworks, and security posture management.
Exposure to secrets management and identity and access management (IAM) principles within cloud infrastructure environments.
Experience with security-oriented observability, including audit logging and anomaly detection.
#J-18808-Ljbffr