Cloud Reliability Engineer
This role requires a self-driven engineer to scale our client's public cloud presence.
You'll work with platform teams to ensure reliable runtimes for business-critical workloads.
The ideal candidate has a background in software or systems engineering and a desire to learn the other, or previous SRE experience.
-----------------------------------
The Expertise We're Looking For
* Bachelor's degree in a technology-related field; Master's degree a plus
* 4+ years of deploying/supporting distributed multi-tiered systems at scale
* Experience with AWS and Azure; certifications a plus
* Knowledge of container orchestration, preferably Kubernetes
* Ability to work collaboratively with diverse individuals and groups
* 8+ years of working experience
* Experience enabling/managing cloud services and optimizations
-----------------------------------
Key Skills and Qualifications
* Experience with observability and resiliency setups
* Understanding of networking, virtualization, storage, containers, serverless
* Linux systems administration
* Automation with scripting languages (Python, Shell)
* Infrastructure as code tools (IAM, ARM, Terraform, Chef)
* Modern monitoring tools (DataDog, Prometheus, Splunk)
* CІ/CD tools, especially Jenkins
-----------------------------------
What You Will Deliver
* Define and execute cloud reliability and observability strategy
* Reduce toil and increase efficiency through integrated data
* Troubleshoot hardware, software, network, application, and cloud issues
* Provide peer code reviews and foster a learning environment