Overview
Technical Architect – Observability & Site Reliability Engineering (SRE) role at
NTT DATA North America
based in
Dublin, Ireland
.
This position leads the design, strategy, and implementation of Observability and SRE frameworks for enterprise-scale, microservices-based applications.
The ideal candidate brings deep technical knowledge of the
Splunk Observability Stack
and Open Source tools (OpenTelemetry, Prometheus, Grafana, Jaeger) and can define and execute architecture strategies for complex distributed systems.
The role requires hands-on ability to create architecture blueprints, lead technical teams, and work with stakeholders to embed observability and reliability practices across the SDLC.
Key Responsibilities
Architecture & Blueprinting: design and deliver end-to-end observability architecture (Metrics, Logs, Traces, Events) for cloud-native and hybrid environments.
Create technical architecture diagrams, data flow maps, and integration blueprints using tools like Lucidchart,, or Visio.
Lead the definition of SLIs, SLOs, and Error Budgets aligned with business KPIs and DORA metrics.
Toolchain Strategy & Implementation: architect telemetry pipelines using OpenTelemetry Collector and Splunk Observability Cloud (SignalFx, APM, RUM, Log Observer).
Define tool adoption strategy and integration roadmap for OSS tools (Prometheus, Loki, Grafana, Jaeger) and Splunk-based stacks.
Guide teams on instrumentation approaches (auto/manual) across languages like Java, Go, Python, .
NET, etc.
Reliability Engineering Enablement: lead adoption of SRE principles including incident management frameworks, resiliency testing, and runbook automation.
Collaborate with DevOps to integrate observability into CI/CD pipelines (e.g., Jenkins, ArgoCD, GitHub Actions).
Define health checks, golden signals, and SPoG (Single Pane of Glass) dashboards.
Exposure to AIOps, ML-based anomaly detection, or business observability.
Stakeholder Management & Governance: serve as a technical liaison between client leadership, SREs, developers, and infrastructure teams; run workshops, assessments, and evangelize observability-first culture; provide guidance on data retention, access control, cost optimization, and compliance (especially with Splunk ingestion policies).
Performance & Optimization: monitor and fine-tune observability data flows to prevent alert fatigue and ensure actionability; implement root cause analysis practices using telemetry correlation across metrics, logs, and traces; lead efforts to build self-healing systems using automated playbooks and AIOps integrations where applicable.
Required Skills & Qualifications
15+ years in IT, with 5+ years in Observability/SRE architecture roles
Proven experience designing architecture for microservices, containers (Docker, Kubernetes), and distributed systems
Strong hands-on expertise with:
Splunk Observability Cloud (SignalFx, Log Observer, APM)
OpenTelemetry (SDKs + Collector)
Prometheus + Grafana
Jaeger / Zipkin for distributed tracing
CI/CD tools: Jenkins, GitHub Actions, ArgoCD
Ability to build and present clear architecture diagrams and solution roadmaps
Working knowledge of cloud environments (AWS, Azure, GCP) and container orchestration (K8s/OpenShift)
Familiarity with SRE and DevOps best practices (error budgets, release engineering, chaos testing)
Nice to Have
Splunk certifications: Core Consultant, Observability Specialist, Admin
Knowledge of ITIL and modern incident management frameworks (PagerDuty, OpsGenie)
Experience in banking or regulated enterprise environments
Soft Skills
Strong leadership and cross-functional collaboration
Ability to work in ambiguous, fast-paced environments
Excellent documentation and communication skills
Passion for mentoring teams and building best practices at scale
Why This Role Matters
The client is on a journey to mature its
Observability and SRE ecosystem
, and this role will be critical in unifying legacy and modern telemetry stacks, driving a reliability-first mindset and tooling, and establishing a scalable blueprint for production excellence.
About NTT DATA
NTT DATA is a $30 billion global innovator of business and technology services.
We serve 75% of the Fortune Global 100 and are committed to helping clients innovate, optimize and transform for long-term success.
We are a Global Top Employer with diverse experts in more than 50 countries and a robust partner ecosystem.
Our services include consulting, data and AI, industry solutions, and the development, implementation and management of applications, infrastructure and connectivity.
NTT DATA is part of NTT Group, which invests over $3.6 billion annually in R&D to help organizations move into the digital future.
Visit us at
Note: Wherever possible, we hire locally to NTT DATA offices or client sites.
In-office attendance may be required based on business needs.
NTT DATA recruiters will never ask for payment or banking information and will only use and email addresses.
If you are requested to provide payment or disclose banking information, please submit a contact us form at
NTT DATA endeavors to make accessible to all users.
If you need assistance completing the application process or have accessibility requests, please contact us at This contact information is for accommodation requests only and cannot be used to inquire about application status.
EEO policy and pay transparency information are available via the linked resources.
#J-18808-Ljbffr