Synopsis
As a key member of our infrastructure team, you will play a pivotal role in ensuring the reliability and performance of our production environment. This involves developing strategies for application performance monitoring and optimization, as well as overseeing all aspects of the environment to ensure seamless operations.
Key Responsibilities:
* Develop and manage strategies for application performance monitoring and optimisation.
* Oversee all aspects of the production environment to ensure seamless operations.
* Respond to incidents and improve platform based on feedback.
* Support deployment of code into multiple lower environments.
* Design, develop, and standardise a monitoring and alerting mechanism for supported applications.
* Take a holistic approach to problem-solving, connecting the dots during a production event across the technology stack to optimise recovery time.
* Engage in and improve the entire lifecycle of services - from inception and design, through deployment, operation, and refinement.
* Analyse ITSM activities of the platform and provide feedback to development teams on operational gaps or resiliency concerns.
* Work with a global team across multiple geographies and time zones.
Required Skills:
* Shell scripting
* Application troubleshooting
* Experience with monitoring tools (Splunk/Dynatrace preferred)