Production Operations Expert
This is a dynamic contract opportunity spanning multiple domains, including operational readiness, site reliability engineering (SRE), DevOps, and IT service management practices.
* Operational Readiness Architect Lead: application health, performance, and capacity planning.
* Collaborate with development teams for launch reviews and monitoring strategies.
* Drive zero-downtime deployment frameworks.
Skill areas include:
* Site Reliability Engineering (SRE): Ensure scalability and resilience of applications.
* Conduct blameless post-mortems and optimize incident response.
* Automate alerts and establish Service Level Objectives (SLOs) with development teams.
The ideal candidate will possess:
* Bachelor's degree in Computer Science or a related technical field.
* Minimum 5 years of support experience.
* Strong experience with Linux systems.
* Familiarity with distributed systems and incident management.
* Excellent communication skills and a problem-solving mindset.
Desirable skills include:
* Experience with Chef and Jenkins.
* Java development experience.
* Scripting and automation expertise.
* Knowledge of enterprise monitoring tools such as Splunk.