Job Title: Senior DevOps Engineer
">
Serving as a Reliability Expert
Our organization is seeking an accomplished DevOps professional who excels in delivering high-quality services and ensuring the smooth operation of complex systems.
About This Role
* Maintaining and enhancing production applications under real-world load and user traffic to guarantee optimal performance and efficiency
* Managing database operations, including data backup strategies, failover procedures, capacity planning, and performance optimization to ensure seamless service delivery
* Operating messaging systems with emphasis on throughput, reliability, and message delivery guarantees for efficient communication and process execution
* Administering NoSQL databases, focusing on monitoring, scaling, data consistency, and disaster recovery, to safeguard against data loss and system downtime
* Resolving production incidents and implementing long-term solutions to prevent recurrence and minimize disruptions to service delivery
* Monitoring system health and proactively addressing performance bottlenecks before they impact users, ensuring continuous availability and quality of service
* Implementing robust backup and disaster recovery procedures that have been thoroughly tested under real conditions to guarantee business continuity and minimize risks
Key Requirements
* Minimum 3 years of hands-on experience in production support for web applications or distributed systems, with a strong focus on delivering high-quality services
* Detailed knowledge of database administration, specifically with production databases (PostgreSQL, MySQL, or similar), with expertise in data backup strategies, failover procedures, and capacity planning
* Experience in operating messaging systems (Kafka, RabbitMQ, Redis, or similar) in production environments, with a focus on throughput, reliability, and message delivery guarantees
* Expertise in managing NoSQL databases (MongoDB, Cassandra, DynamoDB, or similar), including scaling, data consistency, and disaster recovery, with a focus on safeguarding against data loss and system downtime
* Proven ability to troubleshoot production incidents effectively, with demonstrated skills in resolving issues under pressure and minimizing disruptions to service delivery
* Strong understanding of monitoring and alerting setup and management (Prometheus, Grafana, ELK stack, or similar), with expertise in configuring tools to detect and respond to system anomalies
* Proficiency in Infrastructure as Code tools (Terraform, CloudFormation, Ansible) used for ongoing operations, not just initial setup, to ensure consistency and efficiency across systems
Location
Dublin, Ireland
Benefits
* Work From Home
This is a Day Rate Contract position. The chosen candidate must be located in Ireland.