Overview
We are seeking an AI Platform Engineer to build and scale the infrastructure that powers our production AI services. You will take cutting-edge models, ranging from speech recognition (ASR) to large language models (LLMs), and deploy them into highly available, developer-friendly APIs. You will be responsible for creating the bridge between the R&D team, who train models, and the applications that consume them. This includes developing robust APIs, deploying and optimizing models on Triton Inference Server (or similar frameworks), and ensuring real-time, scalable inference.
Responsibilities
* API Development: Design, build, and maintain production-ready APIs for speech, language, and other AI models. Provide SDKs and documentation to enable easy developer adoption.
* Model Deployment: Deploy models (ASR, LLM, and others) using Triton Inference Server or similar systems. Optimize inference pipelines for low-latency, high-throughput workloads.
* Scalability & Reliability: Architect infrastructure for handling large-scale, concurrent inference requests. Implement monitoring, logging, and auto-scaling for deployed services.
* Collaboration: Work with research teams to productionize new models. Partner with application teams to deliver AI functionality seamlessly through APIs.
* DevOps & Infrastructure: Automate CI/CD pipelines for models and APIs. Manage GPU-based infrastructure in cloud or hybrid environments.
Requirements
* Core Skills - Strong programming experience in Python (FastAPI, Flask) and/or Go/Node.js for API services. Hands-on experience with model deployment using Triton Inference Server, TorchServe, or similar. Familiarity with both ASR frameworks and LLM frameworks (Hugging Face Transformers, TensorRT-LLM, vLLM, etc.).
* Infrastructure - Experience with Docker, Kubernetes, and managing GPU-accelerated workloads. Deep knowledge of real-time inference systems (REST, gRPC, WebSockets, streaming). Cloud experience (AWS, GCP, Azure).
* Bonus - Experience with model optimization (quantisation, distillation, TensorRT, ONNX). Exposure to MLOps tools for deployment and monitoring.
Employment details
* Seniority level: Not Applicable
* Employment type: Full-time
* Job function: Engineering and Information Technology
* Industries: IT System Custom Software Development
#J-18808-Ljbffr