Sr. Infrastructure Reliability Engineer, Infrastructure Reliability & Quality
Join to apply for the Sr. Infrastructure Reliability Engineer, Infrastructure Reliability & Quality role at Amazon Web Services (AWS).
Description
As an Infrastructure Reliability Engineer, you will proactively identify, assess, and mitigate reliability risks for datacenter infrastructure, including mechanical and electrical equipment such as Air Handling Units, Switchgear, Breakers, Panel Boards, UPS, Transformers, Generators, ATS, in-rack power equipment, and infrastructure security systems like cameras and access control. Your responsibilities include root cause analysis of critical equipment failures and driving continuous improvements to enhance datacenter availability for AWS customers. You will collaborate closely with internal teams and external partners, including suppliers, to influence product specifications, risk management plans, and execution strategies. The role requires ownership, independence, and a results-oriented mindset to succeed in a collaborative environment.
Senior Reliability Engineers at AWS employ physics-of-failure approaches, analytical and empirical methods to evaluate product quality and reliability during design, manufacturing, and deployment stages. They conduct lifecycle stress analyses, identify weaknesses, and evaluate design quality risks. They also develop reliability models, monitor field performance, and lead root cause analyses for critical failures, driving vendor audits and process improvements. As a subject matter expert, you will communicate with vendors, lead problem-solving efforts, and manage multiple qualification activities, including international travel.
Key Responsibilities:
1. Drive reliability risk identification, assessment, and mitigation for critical data center equipment.
2. Conduct root cause analysis of critical failures in the field.
3. Collaborate with internal and external partners on product specifications and reliability qualification.
4. Develop and maintain reliability models and quantify risks.
5. Monitor field performance and implement reliability improvements.
6. Provide technical leadership and best practices in reliability engineering.
About The Team
AWS Infrastructure Services manages the design, planning, delivery, and operation of AWS global infrastructure, supporting data centers, servers, storage, networking, power, and cooling. The team comprises diverse professionals working on complex challenges to ensure safety, security, and capacity at scale. AWS fosters an inclusive culture that values bold ideas, diversity, mentorship, and work-life balance.
Basic Qualifications:
* Bachelor's or Master’s degree in Reliability Engineering, Physics, Electrical, Mechanical, or Materials Engineering or related field.
* 8+ years of reliability engineering experience in high-reliability industries.
* 5+ years of failure analysis and root cause analysis experience.
* 5+ years of experience with accelerated life testing, stress analysis, and finite element analysis.
Preferred Qualifications:
* Ph.D. in Reliability Engineering or related fields.
* 10+ years of experience in reliability risk assessment from component to system level.
* Experience with proactive reliability approaches throughout product lifecycle.
* Experience working with external supply chain partners.
* Knowledge of data center infrastructure equipment reliability performance.
* Ability to manage multiple qualification projects and schedules.
Amazon is an equal opportunity employer committed to diversity and inclusion. We value your experience and skills and are dedicated to protecting your privacy. For accommodations during the application process, visit this link.
#J-18808-Ljbffr