We’re looking for a Senior Software Engineer to help design, build, and scale the next generation of our distributed systems. As part of a high-impact engineering team, you will work on improving the performance, reliability, and automation of our cloud-native platforms and services. This is a great opportunity to solve complex engineering challenges while contributing to mission-critical systems at scale.
Responsibilities
Design, build, and maintain scalable backend systems and cloud-based services.
Write clean, testable, and efficient code following engineering best practices.
Develop automation and tooling to reduce manual effort and improve system reliability.
Enhance observability through monitoring, logging, and distributed tracing.
Support integration of AI-driven automation and observability platforms.
Work closely with product and infrastructure teams to ship features and improvements iteratively in Agile teams.
Contribute to CI/CD pipeline improvements to streamline delivery and deployment.
Collaborate with cross-functional teams to define and track SLIs/SLOs and support service reliability.
Participate in incident response, root cause analysis, and long-term remediation.
Required Skills
5+ years of experience in software engineering using Python, Go, or Java to build scalable, production-grade systems.
Strong understanding of software design principles such as modularity, separation of concerns, fault tolerance, and scalability.
Experience with writing and maintaining test suites, including unit, integration, and end-to-end tests.
Proficient in using Git and working within collaborative version control workflows.
Familiarity with CI/CD systems and deployment automation best practices.
Experience with API development using RESTful or gRPC standards.
Participation in Agile development processes such as Scrum or Kanban.
Experience conducting code reviews and a solid commitment to clean, maintainable code.
Infrastructure & Systems Skills
Hands-on experience operating and scaling distributed systems in a cloud environment (AWS, GCP, or Azure).
Familiarity with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi).
Experience with Docker and Kubernetes in production environments.
Solid understanding of Linux/Unix systems and strong troubleshooting skills.
Familiarity with monitoring and observability tools such as Prometheus, Grafana, Datadog, or Splunk.
Understanding of key Internet protocols (TCP/IP, DNS, HTTP/S, TLS).
Exposure to SRE practices including SLIs, SLOs, incident response, and postmortems.
Desired Skills
Experience working in compliance-sensitive, global, or multi-tenant environments.
Exposure to chaos engineering, fault injection, or performance/load testing tools.
Familiarity with AI/ML-driven observability or automation systems.
A data-driven mindset for identifying systemic issues and driving reliability improvements.
Strong written and verbal communication skills, with an emphasis on documentation and collaboration.
#J-18808-Ljbffr