Senior Site Reliability Engineer

About the Role

You will take full ownership of reliability, observability, and incident response for a data-heavy SaaS platform. You will work across the stack—from AWS infrastructure to Rust microservices and TypeScript services—to measure and improve availability, latency, and performance. You will define SLAs and SLOs, expand metrics/logs/traces, write playbooks, improve response workflows, and run blameless postmortems. You will optimize CI/CD, autoscaling, and networking, trace complex issues through code and systems, and automate recovery to reduce mean time to recovery.

Requirements

  • 5+ years of experience in Site Reliability DevOps or Infrastructure Engineering roles
  • Deep understanding of distributed systems and debugging at network application and database layers
  • Hands-on experience with AWS and container orchestration such as Kubernetes or ECS
  • Experience with Infrastructure as Code tools such as Pulumi
  • Comfortable tracing through Rust and TypeScript code to diagnose complex performance or reliability issues
  • Experience with or willingness to learn Cassandra and ClickHouse in production
  • Strong collaboration and communication skills
  • Systematic and analytical approach to building reliable systems at scale
  • Interest in crypto finance or large-scale data systems

Responsibilities

  • Own reliability end-to-end and improve service availability and performance
  • Enhance observability by expanding and refining metrics logs and traces
  • Lead incident management and define playbooks and response workflows
  • Strengthen infrastructure by optimising AWS configurations CI CD autoscaling and networking
  • Collaborate across product and engineering to embed reliability in designs
  • Drive continuous improvement and automate recovery to reduce MTTR
  • Champion SRE best practices including capacity planning runbooks and resilience testing

Benefits

  • 100% remote (UK only)
  • Opportunities to visit Paris or London hubs
  • Full benefits package

Skills

Apply Now
Senior Site Reliability Engineer at Cryptio | JobStash