Principal DevOps Engineer
About the Role
You will own the reliability roadmap and lead the architectural evolution of production systems toward higher availability and predictable failure recovery. You will redesign infrastructure for resilience, introduce redundancy and chaos engineering, and elevate observability to detect risk before incidents. You will improve runbooks, incident response protocols, and post-mortem practices. You will mentor and uplift other DevOps engineers and partner with protocol, client, and backend teams to bake reliability into designs.
Requirements
- 10+ years of operating distributed systems at scale
- System design depth for fault tolerance and trade-off analysis
- Battle-tested intuition for failure modes under load and partition
- Fluency in Kubernetes, Terraform, and cloud infrastructure
- Proven experience leading reliability transformations
- Ability to lead through influence
Responsibilities
- Own the reliability roadmap
- Redesign infrastructure for resilience
- Elevate observability from system health to risk awareness
- Strengthen operational rigour, runbooks, and incident response
- Mentor and uplift the DevOps team
- Partner across engineering to bake reliability into design
Benefits
- Token compensation
- Premium health insurance for you and your family fully covered
- Monthly wellness budget
- London HQ with gym access and daily food
- All tools and tech provided
- Visa sponsorship and relocation support
