Description
We're looking for a Senior Software Engineer to help design, build, and scale the next generation of our distributed systems. As part of a high-impact engineering team, you will work on improving the performance, reliability, and automation of our cloud-native platforms and services. This is a great opportunity to solve complex engineering challenges while contributing to mission-critical systems at scale.
Responsibilities
* Design, build, and maintain scalable backend systems and cloud-based services.
* Write clean, testable, and efficient code following engineering best practices.
* Develop automation and tooling to reduce manual effort and improve system reliability.
* Enhance observability through monitoring, logging, and distributed tracing.
* Support integration of AI-driven automation and observability platforms.
* Work closely with product and infrastructure teams to ship features and improvements iteratively in Agile teams.
* Contribute to CI/CD pipeline improvements to streamline delivery and deployment.
* Collaborate with cross-functional teams to define and track SLIs/SLOs and support service reliability.
* Participate in incident response, root cause analysis, and long-term remediation.
Required Skills
* 5+ years of experience in software engineering using Python, Go, or Java to build scalable, production-grade systems.
* Demonstrated experience in developing and deploying production-grade software applications or services.
* Strong understanding of software design principles such as modularity, separation of concerns, fault tolerance, and scalability.
* Experience with writing and maintaining test suites, including unit, integration, and end-to-end tests.
* Proficient in using Git and working within collaborative version control workflows.
* Familiarity with CI/CD systems and deployment automation best practices.
* Experience with API development using RESTful or gRPC standards.
* Participation in Agile development processes such as Scrum or Kanban.
* Experience conducting code reviews and a solid commitment to clean, maintainable code.
* Infrastructure & Systems Skills
* Hands-on experience operating and scaling distributed systems in a cloud environment (AWS, GCP, or Azure).
* Strong experience with AWS or GCP and services like EC2, VPC, IAM, S3, EKS.
* Expertise in Kubernetes and modern container orchestration.
* Familiarity with infrastructure-as-code tools (e.g., Terraform, CloudFormation, Pulumi).
* Experience with Docker and Kubernetes in production environments.
* Solid understanding of Linux/Unix systems and strong troubleshooting skills.
* Familiarity with monitoring and observability tools such as Prometheus, Grafana, Datadog, or Splunk.
* Understanding of key Internet protocols (TCP/IP, DNS, HTTP/S, TLS).
* Exposure to SRE practices including SLIs, SLOs, incident response, and postmortems.
Desired Skills
* Experience working in compliance-sensitive, global, or multi-tenant environments.
* Exposure to chaos engineering, fault injection, or performance/load testing tools.
* Experience with AI/ML platforms, agents, or intelligent observability systems.
* A data-driven mindset for identifying systemic issues and driving reliability improvements.
* Strong written and verbal communication skills, with an emphasis on documentation and collaboration.