Director of Site Reliability Engineering
About the Role
You will lead an experienced Site Reliability Engineering team responsible for making production and testing environments reliable, observable, and easy to operate. You will define SRE processes, set quarterly objectives, track engineering KPIs, participate in sprint planning, mentor engineers, manage on-call rotations, troubleshoot incidents, and contribute hands-on code and infrastructure work.
Requirements
- 3+ years of experience working as a Site Reliability Engineer
- 3+ years of experience managing an SRE team
- Experience collaborating with development teams across design development CI testing and production
- Experience defining measuring and driving improvements in KPIs
- Experience assisting teams with root cause analysis and postmortems
- Designing and building infrastructure for large distributed systems
- Maintaining highly available infrastructure
- Troubleshooting complex technical problems
- Experience with configuration management or IaC tools such as Terraform Ansible or Puppet
- Experience building and maintaining infrastructure using Kubernetes
- Highly autonomous with the ability to find clarity in ambiguity
- Excellent communication skills for remote collaboration
Responsibilities
- Establish a clear vision and mandate for the SRE team
- Define the SRE team's quarterly OKRs
- Define collaboration processes between SREs and development teams across the software lifecycle
- Define career growth paths and coach and mentor individual contributors
- Define and track engineering metrics and hold teams accountable for KPIs
- Coordinate priorities with other teams and organizational areas
- Participate in sprint planning and track progress
- Design and build reliable, easy-to-use systems and infrastructure
- Monitor and troubleshoot production systems
- Define and participate in 24/7 on-call rotations
- Mediate technical discussions and review pull requests
- Perform hands-on code fixes and troubleshooting when needed
- Collaborate with external partners and advise on integrations
Benefits
- Lumen-denominated grants
- Competitive health dental and vision coverage with most plans covered at 100% for employee and dependents
- Flexible time off plus 15 company holidays including company-wide holiday break
- Up to 12 weeks paid parental leave for non-birthing and birthing parents and up to 14 weeks paid pregnancy leave
- Gym reimbursement
- Life and AD&D insurance
- Short and long term disability
- 401K with 4% match
- Health and Dependent Care FSA accounts
- Commuter benefits with employer contribution
- Health Savings Account with monthly employer contribution
- Family building benefits
- Wellbeing benefits (One Medical Rightway Headspace)
- Learning and development budget
- Daily lunch and office snacks
- Company retreats
