Senior Staff Site Reliability Engineer

About the Role

You will manage compute infrastructure and the applications that depend on it, focusing on reliability, automation, and observability. You will operate and maintain workload orchestration platforms (for example Kubernetes or Nomad), implement Infrastructure as Code and GitOps workflows, and automate deployments and operational runbooks. You will investigate production incidents, debug distributed systems, and write clear postmortems and documentation. You will communicate frequently and clearly with distributed teams, propose multiple solutions to problems, and drive initiatives to improve operational resilience. You should be comfortable programming in at least one language and able to reason about software design and distributed systems. Experience running blockchains, validators, or remote signers/multisig is a plus.

Requirements

  • Experience working in distributed teams
  • Strong verbal and written English communication
  • Experience with Infrastructure as Code
  • Experience with GitOps
  • Experience with workload orchestration such as Kubernetes or Nomad
  • Ability to reason about software design and distributed systems
  • Proficiency in at least one programming language
  • Experience running blockchains, validators, or remote signers is a plus

Responsibilities

  • Manage compute infrastructure and supporting applications
  • Operate and maintain workload orchestration using Kubernetes or Nomad
  • Implement and maintain Infrastructure as Code and GitOps workflows
  • Automate deployments and operational processes
  • Troubleshoot and debug distributed systems and production incidents
  • Document architectures, runbooks, and postmortems
  • Communicate clearly with distributed teams and propose solutions

Benefits

  • Fully remote work from anywhere
  • Flexible work schedule
  • Generous vacation policy

Skills

Apply Now
Senior Staff Site Reliability Engineer at xLabs | JobStash