Site ReliabilityEngineer (Automation and DevOps)
Location:
Dublin, Ireland
Key
Responsibilities
·Plan, manage, and
oversee all aspects of a production environment
·Define strategies for
application performance monitoring and optimisation in a production environment
·Respond to incidents
·Improvise platform
based on feedback and measure the reduction of incidents over time
·Support deployment of
code into multiple lower environments
·Support current
processes with an emphasis on automating everything as soon as possible
·Design, develop and
standardise a monitoring and alerting mechanism for the supported applications
·Take a holistic
approach to problem-solving, by connecting the dots during a production event
through the various technology stack that makes up the platform, to optimising
meantime to recover
·Engage in and improve
the whole lifecycle of services - from inception and design, through
deployment, operation and refinement
·Analyse ITSM
activities of the platform and provide feedback loop to Development teams on
operational gaps or resiliency concerns
·Support services
before they go live through activities such as system design consulting,
capacity planning and launch reviews
·Support the
application CI/CD pipeline for promoting software into higher environments
through validation and operational gating, and lead in DevOps automation and
best practices
·Maintain services
once they are live by measuring and monitoring availability, latency and
overall system health
·Scale systems
sustainably through mechanisms like automation and evolving systems by pushing
for changes that improve reliability and velocity
·Work with a global
team spread across tech hubs in multiple geographies and time zones
·Ability to share
knowledge and explain processes and procedures to others
·Share knowledge and
mentor Junior resources
·Ability to perform
on-call duties on a rotational basis
·Occasional off-hours
work required
Skills
Required
Must
have:
·Linux
·Mainframe
·Shell scripting
·ITIL / ITSM
·Application troubleshooting
·SQL
·Any monitoring tool (Splunk
/ Dynatrace preferred)
·Jenkins - CI/CD
·Groovy scripting / YAML
(basic)
·Git (basic) / Bitbucket
(basic)
Good
to have:
·Ansible / Chef
·Event framework
architecture