Senior Site Reliability Engineer
About the Role
You will support and operate monitoring services across the infrastructure, deploy and maintain externally facing services, improve reliability and observability of internal infrastructure, provide engineers with reliable release pipelines to enable fast deployments, use on-call shifts to prevent incidents, tune monitoring and alerting to surface symptoms rather than outages, and debug production issues across services and levels of the stack.
Requirements
- 5+ years of relevant professional experience
- Software engineering or operations background with SRE or closely related experience
- Experience with system architecture and cloud implementation (example referenced AWS)
- Experience with CI/CD pipelines and software delivery
- Experience with distributed systems and container orchestration
- Experience building or maintaining Kubernetes clusters
- Ability to read and write code and automate tasks with scripts and tools
- Strong communication skills and ability to give and receive feedback
- Excitement for blockchain and Web 3.0 technologies
- Experience with information security and DevSecOps
- Experience working remotely in a distributed team
Responsibilities
- Support monitoring services across the infrastructure
- Deploy and maintain externally facing services
- Improve reliability and observability of internal infrastructure
- Provide and maintain reliable release pipelines
- Prevent incidents during on-call shifts
- Configure monitoring and alerting to signal symptoms not outages
- Debug production issues across services and stack
