Own and evolve the reliability, visibility, and operational correctness of our platform spanning cloud services and physical hardware.
About the Role
We're looking for an Infrastructure & Observability Engineer to own and evolve the reliability, visibility, and operational correctness of our platform.
This role is primarily focused on infrastructure, monitoring, logging, alerting, and production operations, with some backend involvement where necessary.
Our systems span cloud services and physical hardware deployed in the field.
You'll be responsible for making sure we can see what's happening across that entire stack, detect issues early, and operate the platform with confidence as it scales.
This is a high-agency role with significant responsibility over production systems.
What You'll Do
Own and evolve our infrastructure on AWS, with services deployed via Kubernetes
Design and maintain observability systems: metrics, logs, traces, dashboards, and alerting
Improve reliability, uptime, and debuggability of production systems
Build tooling and automation to support deployments, rollbacks, and incident response
Work closely with backend engineers to ensure services are observable and operable by default
Support and debug real-time systems, including WebSocket-based device-to-cloud communication
Occasionally contribute to backend services (TypeScript or Rust) when required to improve operability
Reason about failures that span cloud infrastructure, networking, backend services, and deployed hardware
We're Hoping You
Have strong experience operating production infrastructure and distributed systems
Are comfortable with AWS, containers, and Kubernetes
Have hands-on experience with monitoring, logging, and alerting systems (OpenTelemetry)
Understand networking concepts and long-lived connections (e.g. WebSockets)
Can work methodically during incidents and improve systems post-mortem
Are comfortable working close to hardware-backed systems, even if your background is primarily cloud-focused
Nice to Have
Experience with Rust or other systems languages
Exposure to embedded Linux, Yocto, or device-level software
Experience operating device fleets or IoT-style platforms
Familiarity with PostgreSQL performance and operational tuning
Light frontend or internal tooling experience
Why Join Induct
High-agency role with significant responsibility over production systems
Work across the full stack from cloud to physical hardware
Be part of a talented engineering team that values reliability and quality
#J-*****-Ljbffr