Site Reliability Engineer (Automation and DevOps)Location: Dublin, IrelandKey ResponsibilitiesPlan, manage, and oversee all aspects of a production environmentDefine strategies for application performance monitoring and optimization in a production environmentRespond to incidentsImprove platform based on feedback and measure the reduction of incidents over timeSupport deployment of code into multiple lower environmentsSupport current processes with an emphasis on automating everything as soon as possibleDesign, develop, and standardize a monitoring and alerting mechanism for the supported applicationsTake a holistic approach to problem-solving by connecting the dots during a production event through the various technology stacks that make up the platform, to optimize mean time to recoveryEngage in and improve the whole lifecycle of services - from inception and design, through deployment, operation, and refinementAnalyze ITSM activities of the platform and provide feedback to development teams on operational gaps or resiliency concernsSupport services before they go live through activities such as system design consulting, capacity planning, and launch reviewsSupport the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead in DevOps automation and best practicesMaintain services once they are live by measuring and monitoring availability, latency, and overall system healthScale systems sustainably through mechanisms like automation and evolving systems by pushing for changes that improve reliability and velocityWork with a global team spread across tech hubs in multiple geographies and time zonesShare knowledge and explain processes and procedures to othersMentor junior resourcesPerform on-call duties on a rotational basisOccasional off-hours work requiredSkills RequiredShell scriptingApplication troubleshootingExperience with monitoring tools (Splunk/Dynatrace preferred)
#J-18808-Ljbffr