Principal Software Engineer to work on cutting‑edge AI/ML applications and agent systems, leveraging modern inference platforms to build production‑ready prototypes. You will contribute to upstream communities like vLLM, TGI, PyTorch, and OpenVINO while building innovative applications that demonstrate the capabilities of next‑generation AI/ML systems. The ideal candidate is energized by working concurrently on a wide variety of projects, independently as well as within a team environment.
What you will do:
Build high‑quality, high‑performing AI/ML applications and agent systems using modern inference platforms for multi‑modal and distributed model serving
Apply and optimize inference techniques including KV cache management, model quantization, and distributed serving to production workloads
Contribute to upstream inference runtime communities such as vLLM, TGI, PyTorch, OpenVINO, and related projects
Build multi‑modal AI applications integrating vision, language, and other modalities
Provide technical leadership and coordination across multiple stakeholders and engineering teams
Apply a growth mindset by staying current with rapid advancements in AI/ML inference technologies
Benchmark and analyze inference performance at scale, driving data‑driven optimization decisions
Publicize innovations through blogs, presentations, conferences, and other technical venues
What you will bring:
Bachelor's degree in Computer Science, Engineering, or equivalent experience
5+ years of experience in AI/ML engineering with focus on production inference systems
Deep expertise in PyTorch and modern deep learning frameworks
Hands‑on experience with inference runtime optimization (model serving, batching, KV cache management)
Advanced programming skills in Python and C++ Proven ability to contribute to and lead open source projects
Strong self‑motivation and organizational skills
Ability to work concurrently on multiple projects, independently and within a team environment
Excellent English written and verbal communication skills
Collaborative attitude and willingness to share ideas openly
The following are considered a plus:
Experience with vLLM, TGI (Text Generation Inference), or similar inference runtimes
Contributions to PyTorch, OpenVINO, or other inference frameworks
Experience with distributed model serving and GPU optimization
Familiarity with Kubernetes and cloud‑native AI/ML deployments
Knowledge of model quantization techniques (GPTQ, AWQ, FP8, etc.)
Experience with CUDA, Triton, or other GPU programming frameworks
Experience with diffusion models and diffusion transformers
Experience building AI agents and agentic systems
Equal Opportunity Policy (EEO)
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.
#J-18808-Ljbffr