About the Role Are you a creative SRE looking for more opportunities to improve reliability, and enjoy building solutions to reduce toil and manual effort?
With constant attention and focus on our customers (both internal and external), you will deliver quickly on a wide range of daily tasks - from environment provisioning, performance monitoring, environment problem solving, ad-hoc requests and automation efforts; while providing transparency of work being performed.
This role requires a good understanding of Linux systems in a Production Environment.
You will be part of a team that supports customers located in public and private cloud environments.
About You We would love to hear from you if you have been part of an Operations to SRE transformation in a previous role, like trying new techniques and approaches to sophisticated problems, love to learn new technologies, are a natural collaborator and an extraordinary teammate who brings out the best in everyone around you.
You understand that availability of Workday Service is paramount, are able to support a daytime-only shift pattern that includes some weekends, provide careful planning of changes, write detailed runbooks, share knowledge with colleagues, and engage in effective teamwork.
You respond to impactful issues promptly and can handle an incident through to completion.
If the work performed is manual and repeated often, you like to find a way to automate the task.
More so, you deliver
Basic Qualifications 2+ years of experience running and maintaining a 24x7 large-scale production environment, preferably across multiple data centers, and across multiple cloud providers such as AWS and GCP.
BS or MS degree in Computer Science, Engineering, or related technical field, or equivalent work experience Other Qualifications Being able to solve problems in tools such as: Docker, Python, Golang, HTTPd, MySQL, Git, Java web applications, etc.
Proven expertise with Linux, debug fundamentals and have a solid understanding of how to quickly isolate issues.
Experience with many tool sets, for example: Chef, Jenkins, OSSEC, SPLUNK, ELK, Ansible, AWX, JIRA, Confluence, Prometheus, Grafana, Artifactory, Kubernetes.
Strong understanding of enterprise level thinking on a few levels; documentation, runbooks, root cause analysis, capacity-trending, bug fixes and scripting.
Experience writing and deploying code in a production environment.