Job Overview
">
We are seeking an experienced Technical Architect to lead our Observability and Site Reliability Engineering (SRE) initiatives. This role will focus on designing and delivering end-to-end observability architecture for cloud-native and hybrid environments.
">
The successful candidate will have a strong background in microservices, containers, and distributed systems, as well as experience with Splunk Observability Cloud, OpenTelemetry, Prometheus, and Grafana.
">
In this role, you will work closely with cross-functional teams to drive reliability-first mindset and tooling, unify legacy and modern telemetry stacks, and establish a scalable blueprint for production excellence.
">
Key Responsibilities:
">
* Design and deliver end-to-end observability architecture for cloud-native and hybrid environments
">
* Lead the definition of SLIs, SLOs, and Error Budgets aligned with business KPIs and DORA metrics
">
* Architect telemetry pipelines using OpenTelemetry Collector and Splunk Observability Cloud
">
* Guide teams on instrumentation approaches across languages like Java, Go, Python, .NET, etc.
">
Requirements:
">
* 15+ years in IT, with 5 years in Observability/SRE architecture roles
">
* Proven experience designing architecture for microservices, containers (Docker, Kubernetes), and distributed systems
">
* Strong hands-on expertise with Splunk Observability Cloud, OpenTelemetry, Prometheus, and Grafana
">
* Ability to build and present clear architecture diagrams and solution roadmaps
">
* Working knowledge of cloud environments (AWS, Azure, GCP) and container orchestration (K8s/OpenShift)
">
Benefits:
">
* Opportunity to work with a leading technology company
">
* Competitive salary and benefits package
">
* Ongoing training and professional development opportunities
">