Detailed Job description:
Role: SRE Engineer
Location: Dublin ireland-Hybrid
Client: Fulcrum Digital
Employment Type: Permanent
Notice period: Immediate – 2 Weeks
Main Responsibilities:
We are looking for a Site Reliability Engineer (SRE) to join our Cyber Security Service Biz Ops team. In this role, you will act as a production readiness steward, ensuring our platforms are reliable, secure, and resilient. You will partner closely with development teams to design, build, implement, and support technology services while driving operational excellence through automation, monitoring, and proactive risk management.
As part of the Biz Ops team, you'll play a key role in leading DevOps transformation, aligning product priorities with operational needs, and continuously improving customer experience.
Key Responsibilities
* Ensure production readiness by implementing operational criteria such as availability, capacity, performance, monitoring, self-healing, and deployment automation.
* Partner with development and product teams to design and support secure, reliable, and scalable services.
* Lead and participate in incident triage, root cause analysis, and long-term remediation to minimize business impact.
* Drive automation for deployments, operations, and monitoring to reduce manual intervention.
* Implement and manage observability practices (monitoring, logging, alerting, tracing) to maintain high service availability.
* Proactively manage production and change activities to maximize customer experience.
* Collaborate with security, risk, and compliance teams to ensure secure operations across environments.
* Provide continuous feedback to development teams to improve system design and customer experience.
* Advocate and contribute to the DevOps and SRE culture across the organization.
Skills & Qualifications
* Proven experience as a Site Reliability Engineer, Biz Ops Engineer, or DevOps Engineer.
* Strong knowledge of SRE principles and Standard Engineering Practices.
* Experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, ArgoCD, etc.).
* Hands-on expertise with cloud platforms (AWS, Azure, GCP).
* Proficiency in containerization & orchestration (Docker, Kubernetes).
* Strong skills in monitoring, observability, and logging tools (Prometheus, Grafana, Splunk, ELK, Datadog, AppDynamics, etc.).
* Scripting and automation skills (Python, Bash, Go, or similar).
* Knowledge of risk management, compliance, and security best practices (ISO 27001, SOC2, PCI-DSS, NIST, GDPR, etc.).
* Strong problem-solving, incident management, and communication skills.
* Experience working in Agile/Scrum environments.