Lead Site Reliability Engineer
What is the job like
- Establish a SRE site and help build an effective, inclusive SRE team.
- Provide technical leadership for the local team and work closely with partner team technical leads and cloud leadership.
- Provide guidance to other team members on managing availability and performance of mission critical services, on building automation to prevent problem recurrence, and building automated responses for non-exceptional service conditions.
- Manage execution of project priorities, deadlines, and deliverables.
- Lead Incident Management during Incidents.
- Responsible for driving MTTR as per the Incident SLA.
- Responsible for having 100% coverage for various alerts covering Application, Infrasture, Security, Flows etc
Qualification:
- 6-10 years of experience in distributed systems, storage systems, or databases, algorithms and data structures and/or Unix/Linux systems internals (e.g., filesystems, system calls) and administration.
- Experience designing, analyzing, and troubleshooting large-scale distributed systems.
- Experience in MySQL or Postgres SQL in database.
- Hands-on experience on operating with k8s and any cloud.
- Excellent communication skills and a sense of ownership, with a systematic problem-solving approach