About the Role
We are looking for a FM Research Engineer PhD Intern to work on the formal verification and correctness of AI Infrastructure systems. Join us to push the boundaries of reliability and correctness in distributed AI inference serving. In this role, you won't be just writing code, you will be applying rigorous mathematical techniques to ensure the correctness, performance, security, and reliability of production AI Infrastructure that powers Huawei Cloud's AI services.
As part of our AI Infrastructure verification focus, you'll work on critical challenges around distributed inference serving, intelligent resource allocation, request routing correctness, and ensuring that complex AI serving systems behave correctly under all conditions. Your work will directly impact the reliability and correctness of real-world production AI services.
Responsibilities
· Apply formal methods (model checking, theorem proving, SMT, deductive verification) to verify correctness properties of AI infrastructure components
· Model and verify distributed AI inference protocols, routing algorithms, and load balancing mechanisms using tools like Isabelle/HOL, TLA+, P, or similar frameworks.
· Analyze and prove safety properties of AI gateway systems and resource management logic
· Develop automated verification workflows that can be integrated into the development lifecycle of AI infrastructure teams
· Collaborate with AI infrastructure engineers to understand system requirements and translate them into formal specifications
· Investigate bugs and correctness issues discovered through formal verification, working with teams to validate fixes
· Write well-documented verification artifacts, technical reports, and contribute to knowledge sharing within the team
Requirements
· Currently pursuing a PhD in Computer Science, Automated Reasoning, Logic, Formal Verification, or related field
· Strong programming skills in at least one language like Rust, Go, C++, Java, Python, or similar
· Experience with some formal verification tool (Z3, CVC5, Lean, Dafny, Gobra, Verus, Isabelle, TLA+, Boogie/Viper, etc.)
· Ability to understand technical specifications and system designs
· Excellent problem solving, communication, and collaboration skills
Nice to Have
· Knowledge of AI/ML inference serving architecture and related challenges
· Solid understanding of distributed systems concepts, including consistency models and fault tolerance
· Previous work on correctness of critical systems, distributed systems, and protocols
What You'll Gain
This internship offers hands-on experience with applied formal methods and principled approaches for the reliability of AI infrastructure. You'll work closely with expert researchers who will guide you through the application of rigorous mathematical and logical reasoning to real-world production systems. The role offers valuable exposure to state-of-the-art AI serving infrastructure, formal verification tools, and distributed systems design at scale. This experience will provide you with skills in rigorous engineering, formal reasoning, and AI serving that are highly valued across the industry. You will have the opportunity to collaborate with teams across Europe and China, and to contribute to academic publications.
Privacy Statement
Please read and understand our West European Recruitment Privacy Notice before submitting your personal data to Huawei so that you fully understand how we process and manage your personal data received.