Senior Site Reliability Engineer

Overview

Azure Cosmos DB is Microsoft’s next generation globally distributed, massively scalable, multi-model cloud database service. It is designed to enable developers to build planet-scale applications. Azure Cosmos DB is one of the fastest growing Azure services. Joining the Azure Cosmos DB team is a fantastic opportunity to work with highly talented engineers operating like a startup, and to deliver on our next set of big challenges.

As a Senior Site Reliability Engineer, you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design to ensure services/systems are highly stable, performant, and meeting the expectations of our customers. You will work closely with other engineering teams and provide a holistic view of our cloud service.

Responsibilities

Identify opportunities and drive the design and implementation of end-to-end telemetry, alerting, self-healing and automation capabilities to improve service health, manageability, and reliability.
Participate in on-call rotations and own, triage, investigate and resolve service issues with an emphasis on broad communications, learning & teaching throughout the process.
Interact with customers / support representatives and communicate on a deeply technical level with product engineering and product management teams to evolve services.
Own availability, performance, and supportability targets for the service.
Author functional and technical documentation and remain current on relevant technologies and procedures.

Qualifications

Knowledge, experience and skills required:

Bachelor's degree in computer science/Engineering/related fields or equivalent industry experience.
6+ years of experience with writing tools, automation / scripting (Powershell, Python or similar), programming (C++, C# or equivalent) and making enhancements in subcomponents within and around services/products to deliver and manage software in production. Experience aiding understanding of distributed systems and networking is preferred.
6+ years of troubleshooting/debugging experience: telemetry-based analysis (KQL or equivalent preferred), troubleshooting skills across network, hardware, and distributed service layers, with demonstrated ability to debug, fix, and optimize code.
Good communications skills, both verbal and written.

Company

Microsoft

Job Posted

2 years ago

WorkMode

Hybrid

Experience Level

3-7 years

Locations

Bangalore Urban, Karnataka, India

Qualification

Bachelor

Applicants

Be an early applicant

Related Jobs

Barclays

Pune, India

Software Engineer

Full-time

Be an early applicant

Posted 3 days ago

Micron Technology

Hyderabad, India

Associate Engineer/ Engineer Data Science

Full-time

Be an early applicant

Posted 3 days ago

IBM

Bengaluru, India

Data Engineer-Data Platforms-AWS

Full-time

Be an early applicant

Posted 3 days ago

GE Vernova

Chennai, India

Graduate Engineer Trainee

Full-time

Be an early applicant

Posted 3 days ago

McKinsey & Company

Bengaluru, India

+1 more

Software Engineer I - Java FullStack

Full-time

Be an early applicant

Posted a day ago

Synopsys Inc

Noida, India

Senior Analog Mixed-Signal Design Engineer

Full-time

Be an early applicant

Posted 2 hours ago

Search

Overview

Responsibilities

Qualifications

Related Jobs

Barclays

Software Engineer

Micron Technology

Associate Engineer/ Engineer Data Science

IBM

Data Engineer-Data Platforms-AWS

GE Vernova

Graduate Engineer Trainee

McKinsey & Company

Software Engineer I - Java FullStack

Synopsys Inc

Senior Analog Mixed-Signal Design Engineer