IntroductionIntroductionAt IBM, work is more than a job — it’s a calling to build, design, code, and make things better for people around the world. IBM Infrastructure is seeking an experienced AI Engineer to help bring Large Language Models (LLMs) to IBM Z (System z), one of the most secure and reliable enterprise computing platforms in the world.This role is intended for professionals with 3+ years of experience in AI/ML systems, performance engineering, or accelerator‑based inference who are interested in working close to the hardware and across multiple layers of the technology stack. You will help enable generative AI for mission‑critical workloads used by banks, healthcare providers, and government agencies worldwide.Who This Role Is ForThis Role Is Ideal For Engineers WhoHave delivered or supported production AI or ML systemsEnjoy working across hardware, system software, and applicationsAre motivated by solving performance‑ and reliability‑critical problemsWant to help define how enterprise‑scale AI runs on mission‑critical platformsYour Role And ResponsibilitiesAs an AI Engineer on the IBM Z team, you will contribute directly to the design, integration, and operation of LLM workloads on enterprise infrastructure. This role is suited to engineers who enjoy solving complex system‑level problems and collaborating across hardware and software domains.
LLM Integration and DeploymentDevelop and integrate LLM inference workloads on IBM Z using Spyre hardware accelerator cards.Implement model loading, runtime integration, memory management, and resource allocation strategies optimized for the IBM Z architecture.Enable both traditional mainframe applications and modern cloud‑native services to access LLM capabilities through well‑defined APIs.Performance Profiling and OptimizationProfile LLM inference workloads to measure latency, throughput, memory usage, and power efficiency.Analyze performance data to identify bottlenecks and optimization opportunities across hardware utilization, kernels, memory access patterns, and batching strategies.Document findings and contribute to performance best practices and internal guidance.Failure Analysis and DebuggingDiagnose and resolve inference errors, performance regressions, and system‑level issues across firmware, drivers, runtimes, and applications.Collaborate with hardware engineers, firmware developers, and system architects to identify root causes and implement durable solutions.Contribute to automated testing and regression detection to improve system reliability.Observability and TelemetryDesign and implement monitoring and telemetry for production LLM workloads.Instrument systems and deploy logging to capture model performance, hardware utilization, error rates, and system health.Create dashboards and alerts to support operational teams with real‑time visibility and historical analysis.Collaboration and Technical LeadershipParticipate in architecture reviews and technical discussions across AI, hardware, firmware, and system software teams.Produce clear technical documentation and share knowledge across the organization.Stay current with advances in LLMs, hardware acceleration, and inference optimization, and apply learnings to improve IBM Z AI capabilities.Education and ExperiencePreferred EducationBachelor\'s DegreeRequired Technical And Professional ExpertiseDemonstrated Professional Experience in AI/ML engineering, ML systems, platform engineering, or performance‑focused software development.Strong programming skills in Python and working experience with C/C++.Solid understanding of machine learning fundamentals, particularly transformer‑based models and inference workflows.Knowledge of computer architecture, including memory hierarchies, parallel processing, and I/O systems.Experience working in Linux environments, using command‑line tools and scripting.Hands‑on experience with profiling, performance analysis, and debugging of complex systems.Familiarity with monitoring, logging, and observability concepts.Strong problem‑solving skills and the ability to communicate technical concepts clearly.Preferred Technical And Professional ExperienceExperience with PyTorch, TensorFlow, or Hugging Face Transformers.Exposure to hardware acceleration technologies such as GPUs or AI accelerators.Familiarity with model optimization techniques (quantization, pruning, knowledge distillation).Knowledge of inference frameworks such as ONNX Runtime, TensorRT, or TorchServe.Experience with observability platforms including Prometheus, Grafana, ELK, or Splunk.Understanding of distributed tracing (OpenTelemetry, Jaeger).Working knowledge of Docker, Kubernetes, and CI/CD pipelines.Exposure to IBM Z, z/OS, or enterprise computing environments (beneficial but not required).Experience working in environments with high requirements for reliability, security, and performance.
#J-18808-Ljbffr