As a leading figure in AI infrastructure, we seek an experienced Ai Platform Engineer to spearhead the development and maintenance of robust APIs for speech, language, and other AI models. The ideal candidate will be responsible for bridging the gap between research and development teams and applications that consume these models.
This role involves designing and building production-ready APIs, providing SDKs and documentation, deploying models using Triton Inference Server or similar systems, optimizing inference pipelines, and architecting infrastructure for large-scale concurrent requests. Additionally, you will work with research teams to productionize new models and partner with application teams to deliver AI functionality through APIs.
In this position, you will have the opportunity to work with cutting-edge models and technologies, including speech recognition (ASR), large language models (LLMs), and other AI models. You will also be responsible for managing GPU-based infrastructure in cloud or hybrid environments, automating CI/CD pipelines, and implementing monitoring, logging, and auto-scaling for deployed services.
The successful candidate should have strong programming experience in Python (FastAPI, Flask) and/or Go/Node.js for API services, as well as hands-on experience with model deployment using Triton Inference Server, TorchServe, or similar. Familiarity with ASR and LLM frameworks, Docker, Kubernetes, and cloud experience (AWS, GCP, Azure) is essential.