Senior / Director: Site Reliability Engineering (m/f/d)
Huawei's Munich Research Center is responsible for advanced technology research, architectural development, design and strategic engineering of our products.
The size of our cloud platform is gaining momentum and it is already planet scale. Huawei Cloud is one of the largest and fastest-growing platforms in the world. It has strong presence with over 40 availability zones located across 4 continents and 23 geographical regions, covering locations such as Germany, Hong Kong, South Africa, or Brazil, among others.
Our team builds data-driven automation capabilities to support cloud operations at global scale. We leverage the value of data, information processing, machine learning, and generative AI to improve service reliability, automation, and operational efficiency. Our research applies AI-driven capabilities primarily in planet-scale Cloud Operations across a variety of products, software, and systems.
Join us as a
Senior / Director: Site Reliability Engineering (m/f/d)
To drive automation and reliability, we are seeking an SRE to join the Ultra-scale AIOps Lab in Munich. This team is entrusted with developing key new innovative solutions for Huawei Cloud. You will take systematic approaches to solve operation problems, dissect how large-scale, complicated systems work, and feel a great satisfaction from making continuous improvements.
Your mission
- Work with engineering teams in designing and developing systems that are highly reliable at a global scale.
- Develop tooling to improve teams’ efficiency and workflows. Develop new solutions to automate repetitive, manual and risky tasks.
- Draw on your knowledge of cloud infrastructures to develop new solutions to identify and fix application, middleware, network and service-level issues.
- Observe large-scale running systems, and determine/prioritize innovative ways to and improve performance, reduce cost, and enhance the experience for millions of users.
Your areas of expertise
- BSc, MSc or PhD degree in Computer Science, Information Technology, or equivalent field
- 10+ years of experience in Software Engineering, Site Reliability Engineering, or DevOps
- Deep understanding of SRE technologies, platforms and tools, SLA, incident resolution, and automation
- Hands-on experience on managing operations of large-scale internet-centric production environments for application or infrastructure services serving tens to millions of end users.
- Hand-on experience with cloud-based technologies and tools especially in monitoring and operations, such as DataDog, Prometheus, Splunk, Elasticsearch, Grafana, or in-house solutions
- Understanding of cloud infrastructures, e.g., LB, middleware, messaging, VM, OS, network routing protocols.
- Troubleshooting skills that span applications, systems, OS and networking.
- Experience in one of the following languages: Python, Java, C or GO
By applying to this position, you agree with our RECRUITMENT PRIVACY STATEMENT. You can read in full our recruitment privacy statement via the link below.
http://career.huawei.com/reccampportal/portal/hrd/weu_rec_all.html
Your rewards of working here
- Our culture is characterized by innovative power and team spirit as well as the intensive exchange of knowledge and experience within our global network.
- We offer healthy meals ranging from traditional Chinese to western delicacies in our famous company canteen.
- To keep your development ongoing, you will find a broad range of training opportunities. Many online and face-to-face training programs incl. language courses in German and Mandarin.
- Our diverse and welcoming environment is shaped by different backgrounds and around 40 individual nationalities.
- Self-responsible work in a competent, motivated and constantly growing team.
Please send your application and CV (incl. cover letter and reference letters) in English.
Huawei is a leading global information and communications technology (ICT) solutions provider. Driven by a commitment to operations, ongoing innovation, and open collaboration, we have established a competitive ICT portfolio of end-to-end solutions in Telecom and enterprise networks, Devices and Cloud technology and services. Our ICT solutions, products and services are used in more than 170 countries and regions, serving over one-third of the world's population. With 197,000 employees, Huawei is committed to develop the future information society and build a Better Connected World.
- Department
- Intelligent Cloud Technologies Laboratory
- Locations
- Munich
Munich
About Huawei Research Center Germany & Austria
Huawei's vision is to enrich life through communication. We are a fast growing and leading global information and communications technology (ICT) solutions provider.
Driven by a commitment to operations, ongoing innovation, and open collaboration, we have established a competitive ICT portfolio of end-to-end solutions in Telecom and enterprise networks, Devices and Cloud technology and services.
Huawei is active in more than 170 countries and has over 197,000 employees of which more than 80,000 are engaged in research and development (R&D). With us you have the opportunity to work in a dynamic, multinational environment with more than 150 nationalities worldwide.
Senior / Director: Site Reliability Engineering (m/f/d)
Loading application form