About this Opportunity
Are you detail-oriented with experience in software, appliances, or system engineering? Are you passionate about designing and operating cloud-native production services at scale? Do you enjoy working with open-source projects and distributed system design, focusing on operational stability and performance?
The EWS Site Reliability Engineering (SRE) team offers an excellent opportunity in a fast-paced, innovative, and highly collaborative technical environment. We invite you to explore our Ericsson Web Services (EWS) offering to learn more:
We are looking for an open-minded, self-driven team member to join a group of top-tier SREs in Shared Development Clouds. With a solution-oriented attitude and an engineering-focused culture, we are responsible for architecting, designing, implementing, deploying, and operating full-stack cloud-native infrastructure and platforms—from hardware to microservices—to enable Ericsson's cloud-native development environment and 5G business.
What you will do
· Integrate open-source projects within the CNCF landscape:
· Design, implement, test, and operate high-quality, failure-resistant private clouds using Kubernetes and the cloud-native ecosystem, ensuring reliability and performance at scale.
· Automate and build advanced CI/CD platforms and software supply chains.
· Develop monitoring, logging, alerting, and proactive issue responses.
· Build data center networks, Kubernetes CNIs solutions, and distributed storage systems such as Ceph and Kubernetes CSIs.
· Engage in scaling, performance tuning, systematic problem-solving, cloud-native security, and Kubernetes hardening.
What you will bring
· A degree in Electrical Engineering, Computer Science, Software Engineering, Telecommunications, or a related technical field.
· A commitment to continuous learning and a passion for open-source technology.
· A knack for creating tools to automate routine tasks; organized and meticulous.
· Ability to thrive in a team setting at a startup pace, with an open mind and self-drive.
· Flexibility to manage on-call duties.
· Expertise in Linux (RHEL, SLES, Ubuntu) and Linux kernel knowledge is a plus.
· Proficiency in at least one programming language (e.g., Go, Python, C/C++).
· Experience in research, software development, platform development, hardware or appliances, IT, service operations, and cloud operations.
· Skills in Linux system administration and network administration.
· Knowledge of data center infrastructure, replication, scaling, and performance tuning.
· Familiarity with metrics, monitoring, and integrating open-source tools.
· Experience with CI/CD tooling and release engineering.
· Experience with public clouds (Azure, AWS, GCP) and comfort with Go, Python, bash scripts, etc.
· Experience with tools such as Git, GitLab, Docker, Rancher, Jenkins, ELK, Redis, Spinnaker, GitOps, Kubeflow, and Ceph.
· Knowledge of Infrastructure as Code tools like Ansible and Terraform.
· Deep knowledge of cloud-native and Kubernetes ecosystems.
· Experience with eBPF and familiarity with Kubeflow/TensorFlow.
· Understanding of cloud technologies: compute, storage, network, database, and security.