Senior Site Reliability Engineer - Kubernetes
Base pay range: Direct message the job poster from Solas IT Recruitment
Remote role with cutting‑edge, expanding organisation
80-90K plus 5% Bonus, 6% Pension, Healthcare for you and family, Life Cover
Key Responsibilities:
Collaborate with engineering teams to enhance service quality through robust testing, performance tuning, and fault identification.
Develop automated solutions to maintain systems and services, ensuring smooth project execution by working closely with internal engineering teams.
Oversee system performance by implementing continuous monitoring and balancing feature development with system reliability, adhering to established service level objectives.
Contribute to the formulation of practices, technologies, and procedures to maintain Security, Compliance, and Availability requirements across system landscapes.
Manage, plan, and execute system upgrades to ensure minimal downtime and optimal system availability.
Required Skills & Qualifications:
Kubernetes: Extensive expertise in managing, deploying, and troubleshooting production Kubernetes clusters, with experience in container orchestration. Familiarity with Amazon EKS is an advantage.
Automation & Configuration Management: Proficiency with Ansible, Helm, and Kustomize for automating infrastructure provisioning and deployment. Skilled at managing Kubernetes manifests and ensuring streamlined application releases across different environments.
Monitoring Tools: Hands‑on experience with systems like Prometheus and Grafana to monitor system health, identify issues, and optimize performance.
Cloud Infrastructure (AWS): Strong knowledge of AWS services such as EC2, S3, IAM, VPC, and associated tools for managing scalable cloud infrastructure.
Infrastructure as Code (IaC): Experience with Terraform for provisioning and maintaining cloud resources, ensuring repeatability and version control in cloud deployments.
Messaging & Queuing Systems: Familiarity with message brokers such as RabbitMQ, Kafka, or managed services like AmazonMQ, with experience in optimizing reliable communication between distributed systems.
Database Expertise: Strong background in managing cloud‑based MySQL databases, particularly with Amazon RDS, focusing on high availability, security, and performance.
Networking & Security: Solid understanding of network security and design to ensure system protection, compliance, and industry‑standard audit readiness.
High Availability Systems: Demonstrated experience in maintaining critical system uptime through fault tolerance, disaster recovery, and proactive monitoring to minimize downtime.
Collaboration & Cross‑functional Teamwork: Proven ability to work effectively across multiple teams, departments, and stakeholders to execute project plans efficiently.
Problem Solving & Optimization: Strong problem‑solving skills with a proactive approach to identifying bottlenecks, system issues, and opportunities for performance improvements.
Seniority Level
Mid‑Senior level
Employment Type
Full‑time
Job Function
Information Technology
Industry
Information Services
#J-18808-Ljbffr