The Job logo



Senior Site Reliability Engineer

ApplyJoin for More Updates

You must Sign In before continuing to the company website to apply.

Smart SummaryPowered by Roshi
Join Microsoft as a Senior Site Reliability Engineer and work with a highly talented engineering team to deliver software improvements for Azure Cosmos DB. As a Senior SRE, you will ensure service stability, performance, and reliability through software development and system design. This is a full-time hybrid opportunity located in Bangalore Urban, Karnataka, India.


Azure Cosmos DB is Microsoft’s next generation globally distributed, massively scalable, multi-model cloud database service. It is designed to enable developers to build planet-scale applications. Azure Cosmos DB is one of the fastest growing Azure services. Joining the Azure Cosmos DB team is a fantastic opportunity to work with highly talented engineers operating like a startup, and to deliver on our next set of big challenges.   

As a Senior Site Reliability Engineer, you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design to ensure services/systems are highly stable, performant, and meeting the expectations of our customers. You will work closely with other engineering teams and provide a holistic view of our cloud service. 


  • Identify opportunities and drive the design and implementation of end-to-end telemetry, alerting, self-healing and automation capabilities to improve service health, manageability, and reliability. 
  • Participate in on-call rotations and own, triage, investigate and resolve service issues with an emphasis on broad communications, learning & teaching throughout the process. 
  • Interact with customers / support representatives and communicate on a deeply technical level with product engineering and product management teams to evolve services.  
  • Own availability, performance, and supportability targets for the service. 
  • Author functional and technical documentation and remain current on relevant technologies and procedures. 


Knowledge, experience and skills required: 

  • Bachelor's degree in computer science/Engineering/related fields or equivalent industry experience. 
  • 6+ years of experience with writing tools, automation / scripting (Powershell, Python or similar), programming (C++, C# or equivalent) and making enhancements in subcomponents within and around services/products to deliver and manage software in production. Experience aiding understanding of distributed systems and networking is preferred. 
  • 6+ years of troubleshooting/debugging experience: telemetry-based analysis (KQL or equivalent preferred), troubleshooting skills across network, hardware, and distributed service layers, with demonstrated ability to debug, fix, and optimize code. 
  • Good communications skills, both verbal and written. 
Set alert for similar jobsSenior Site Reliability Engineer role in Bangalore Urban, India
Microsoft Logo



Job Posted

10 months ago

Job Type




Experience Level

3-7 years


Bangalore Urban, Karnataka, India




Be an early applicant

Related Jobs

SAP Logo

Senior Linux Site Reliability Engineer (Pacemaker)


Bangalore Urban, Karnataka, India

Posted: 8 months ago

Seeking a Senior Linux Site Reliability Engineer with expertise in Pacemaker. Troubleshoot complex Pacemaker software and Linux OS/ infrastructure issues. Develop automation for stability and reliability. Standardize and simplify server operations using DevOps. Requires 7-12 years of related experience with advanced technical background in Linux based server operating systems. Strong knowledge of Linux HA clusters, networking, and IT security. Experience with script programming and server automation tools. Fluency in English and ability to work in global teams.

Criteo Logo

Senior Site Reliability Engineer


Barcelona, Barcelona, Spain

+2 more

Posted: 9 months ago

What You'll Do: Sr. SRE acts as an expert in both operations on GNU/Linux systems and cloud providers as well as in automation tools and practices. Main responsibilities are creating, supporting and improving the infrastructure. Key Responsibilities: Setup and maintain projects using Infrastructure as a Code (IaC) principals Investigate issues Providing to the application developers an ability to deploy and update the applications in the production environment Support the integration with services such as log collection, metric collection and monitoring Participate in the development of configuration management, deployment and monitoring of infrastructure, automation of the processes Participate in the architecture design of new software components or their parts Explore and apply modern technologies and practices where practical Consider the cost effectiveness of the production infrastructure. Working with other teams to ensure that commonly used technical components created by Iponweb integrate well into the production infrastructure Maintain up-to-date documentation on processes and code utilised by the team Who You Are: You have good Linux and Unix Shell knowledge, particularly Ubuntu/Debian based Linux systems You have experience with cloud providers (AWS, GCP). Programming skills are required, familiarity with languages such as Python, Go or similar is required You have a good understanding of TCP/IP networking principles You have experience with monitoring and metric systems, such as Zabbix, Prometheus, Graphite and similar You have ability to set priorities and to take responsibility Have a preference to solve problems in the production by means of automation instead of doing operations manually Have experience with container technologies, such as Kubernetes, Docker and tools used in conjunctions with these Have experience with modern configuration management tool (Puppet, Ansible) Have experience managing databases such as MongoDB, PostgreSQL Decent communication skills Decent English skills