This is a full-time (based in Galway/Ireland) role for a Senior Deployment Engineer specializing in AI Inference solutions. The role includes deploying AI-driven systems, troubleshooting technical issues, providing advanced technical support, integrating new technologies, and monitoring and optimizing the performance of deployed systems. The Senior Deployment Engineer will work closely with cross-functional teams to ensure successful implementation and operation of AI-powered healthcare solutions.
Responsibilities
* Architect and drive the technical vision, long‑term roadmap, and hands-on implementation of the Inference Platform, spanning large‑scale internal deployments and enterprise on‑prem installations.
* Design and lead the development of distributed inference subsystems, including high‑throughput request routing, autoscaling algorithms, load balancing, and resource orchestration optimized for hardware.
* Establish and enforce operational excellence standards, ensuring platform SLOs such as >99.9% availability, deterministic latency, and efficient hardware utilization across heterogeneous clusters.
* Build, mentor, and scale a high‑performance engineering organization, setting technical direction, enforcing engineering best practices, and accelerating execution velocity.
* Evolve the platform into a production‑grade, enterprise‑deployable system by collaborating with product, operations, and customer engineering teams to deliver robust, supportable on‑prem solutions.
Skills & Qualifications
* Technical Leadership:
6+ years building high‑scale distributed systems, including 3+ years leading ML infrastructure or inference platform teams; expert-level coding, design review, and architectural decision-making skills.
* Inference Performance:
Demonstrated ability to scale LLM inference workloads, optimizing P99 latency (<100ms), throughput, batching strategies, KV‑cache efficiency, memory/IO pipelines, and end‑to‑end resource utilization.
* ML Systems Expertise:
Deep understanding of distributed inference/training architectures for modern LLMs, including tensor/sequence parallelism, scheduling, and model partitioning; familiarity with major cloud ecosystems (AWS/GCP/Azure).
* Frameworks & Tooling:
Hands-on experience with model‑serving frameworks (vLLM, TensorRT‑LLM, Triton, etc.) and ML stacks such as PyTorch, Hugging Face, and managed ML platforms like SageMaker.
* Infrastructure Engineering:
Strong background with cluster orchestration (Kubernetes/EKS, Slurm), large‑scale compute environments, service mesh patterns, and low‑latency networking.
* Reliability & Observability:
Proficiency with monitoring, tracing, and reliability engineering using Prometheus, Grafana, and incident management workflows (on‑call, RCA, post‑mortems).
* Leadership & Cross‑Functional Execution:
Proven ability to recruit and develop engineering talent, drive architectural alignment, and partner with product and customer teams to deliver mission‑critical systems.
Preferred Skills
* Experience deploying and operating ML systems in on‑prem or private cloud environments.
* Background in edge inference, streaming inference pipelines, multi‑region architectures, or AI security/privacy.
* Direct experience supporting enterprise customers through deployment, integration, and productionization phases.