Principal linux engineer

Dublin

Engage People Recruitment

Linux engineer

Posted: 23 March

Offer description

My client is building next-generation AI and high-performance computing platforms that power advanced machine learning, data science, and large-scale compute workloads. We operate high-density GPU clusters and are looking for a Principal Linux Engineer to lead the design, optimization, and reliability of our GPU-based infrastructure.
As a Principal Linux Engineer specializing in GPU systems, you will architect, deploy, and operate high-performance Linux environments optimized for GPU workloads including AI/ML training, inference, simulation, and data processing. You will work closely with ML engineers, platform teams, and DevOps to ensure performance, scalability, and reliability across our compute infrastructure. This is a hands‑on technical leadership role requiring deep Linux expertise and strong experience managing GPU-based systems at scale.
Key Responsibilities

Architect and maintain enterprise‑grade Linux systems (RHEL, Rocky, Ubuntu, or equivalent)
Kernel tuning and performance optimization for HPC and GPU workloads
Develop automation for provisioning and lifecycle management
Troubleshoot complex OS‑level, hardware, and performance issues

GPU Infrastructure & Performance

Deploy and manage NVIDIA GPU infrastructure (A100, H100, or equivalent)
Install, configure, and maintain NVIDIA drivers, CUDA, NCCL, and related libraries
Optimize multi‑GPU and multi‑node performance
Monitor GPU utilization, thermals, and power efficiency
Diagnose PCIe, NVLink, NUMA, and memory bottlenecks
Manage large‑scale compute clusters (on‑prem or cloud)
Integrate GPUs into Kubernetes environments (GPU operator, device plugins)

Automation & Infrastructure as Code

Build infrastructure using Terraform, Ansible, or similar
Develop CI/CD workflows for system configuration
Automate GPU fleet provisioning and configuration management

Reliability & Observability

Establish SLOs and capacity planning models
Lead incident response for infrastructure outages
Conduct root cause analysis and implement preventive measures

Security & Compliance

Harden Linux systems using security best practices
Implement access controls, patch management, and vulnerability remediation
Support SOC2 / ISO27001 / FedRAMP initiatives (if applicable)

Required Qualifications

7+ years of Linux systems engineering experience
3+ years managing GPU infrastructure in production environments
Deep knowledge of:
Linux internals (kernel, memory management, networking stack)
NVIDIA driver stack, CUDA, and GPU troubleshooting
High‑performance storage (NVMe, parallel file systems)
Networking (10/25/40/100GbE, InfiniBand preferred)
Experience with:
Kubernetes with GPU workloads
Infrastructure as Code (Terraform, Ansible)
Python or Bash scripting
Strong debugging and performance analysis skills
Experience operating in large‑scale production environments

#J-18808-Ljbffr

Apply

Create an E-mail Alert

Save

Similar job

Senior embedded linux engineer - risc-v & open source

Dublin

Microchip Technology Inc.

Linux engineer

Similar job

Embedded linux engineer for next-gen defense tech

Dublin

Anduril Industries

Linux engineer

Similar job

Senior linux engineer – gpu/hpc infra & performance

Dublin

Engage People Recruitment

Linux engineer