Senior site reliability engineer

this Opportunity

Site reliability engineer

Posted: 12 December

Offer description

About this Opportunity

Are you detail-oriented with experience in software, appliances, or system engineering? Are you passionate about designing and operating cloud-native production services at scale? Do you enjoy working with open-source projects and distributed system design, focusing on operational stability and performance?

The EWS Site Reliability Engineering (SRE) team offers an excellent opportunity in a fast-paced, innovative, and highly collaborative technical environment. We invite you to explore our Ericsson Web Services (EWS) offering to learn more:

We are looking for an open-minded, self-driven team member to join a group of top-tier SREs in Shared Development Clouds. With a solution-oriented attitude and an engineering-focused culture, we are responsible for architecting, designing, implementing, deploying, and operating full-stack cloud-native infrastructure and platforms—from hardware to microservices—to enable Ericsson's cloud-native development environment and 5G business.

What you will do

· Integrate open-source projects within the CNCF landscape:

· Design, implement, test, and operate high-quality, failure-resistant private clouds using Kubernetes and the cloud-native ecosystem, ensuring reliability and performance at scale.

· Automate and build advanced CI/CD platforms and software supply chains.

· Develop monitoring, logging, alerting, and proactive issue responses.

· Build data center networks, Kubernetes CNIs solutions, and distributed storage systems such as Ceph and Kubernetes CSIs.

· Engage in scaling, performance tuning, systematic problem-solving, cloud-native security, and Kubernetes hardening.

What you will bring

· A degree in Electrical Engineering, Computer Science, Software Engineering, Telecommunications, or a related technical field.

· A commitment to continuous learning and a passion for open-source technology.

· A knack for creating tools to automate routine tasks; organized and meticulous.

· Ability to thrive in a team setting at a startup pace, with an open mind and self-drive.

· Flexibility to manage on-call duties.

· Expertise in Linux (RHEL, SLES, Ubuntu) and Linux kernel knowledge is a plus.

· Proficiency in at least one programming language (e.g., Go, Python, C/C++).

· Experience in research, software development, platform development, hardware or appliances, IT, service operations, and cloud operations.

· Skills in Linux system administration and network administration.

· Knowledge of data center infrastructure, replication, scaling, and performance tuning.

· Familiarity with metrics, monitoring, and integrating open-source tools.

· Experience with CI/CD tooling and release engineering.

· Experience with public clouds (Azure, AWS, GCP) and comfort with Go, Python, bash scripts, etc.

· Experience with tools such as Git, GitLab, Docker, Rancher, Jenkins, ELK, Redis, Spinnaker, GitOps, Kubeflow, and Ceph.

· Knowledge of Infrastructure as Code tools like Ansible and Terraform.

· Deep knowledge of cloud-native and Kubernetes ecosystems.

· Experience with eBPF and familiarity with Kubeflow/TensorFlow.

· Understanding of cloud technologies: compute, storage, network, database, and security.

Apply

Create an E-mail Alert

Save

Similar job

Site reliability engineer, emergency and disaster resilience

Dublin

Google

Site reliability engineer

Similar job

Lead site reliability engineer

Dublin

JPMorganChase

Site reliability engineer

Similar job

Site reliability engineer, emergency and disaster resilience

Dublin

Google

Site reliability engineer