Senior Site Reliability Engineer, Compliance

About the Role

You will ensure the reliability, security, and scalability of production systems. You will define and maintain SLOs and SLAs, lead incident response and post‑mortems, and implement disaster recovery and release strategies. You will build and maintain infrastructure as code, perform day-to-day operational tasks, and create runbooks and documentation. You will audit and manage security controls to meet compliance standards and collaborate with other teams to provision test environments, conduct performance tests, and improve operational metrics.

Requirements

  • 3 to 5 years managing software deployments and instrumentation in production with defined SLAs and SLOs
  • Strong knowledge of software delivery and DevOps principles
  • Experience with cloud platforms such as AWS, CloudFlare, or GCP
  • Experience with infrastructure-as-code tools such as Terraform or CloudFormation
  • Strong programming and scripting skills, preferably Python, Go, or Ruby
  • Experience supporting on-call rotations for 24x7 services
  • Ability to take substantial features from concept to shipping as a sole contributor
  • Strong problem solving, critical thinking, and written communication skills
  • Bachelor’s degree in Computer Science, Information Security or related field, or professional certifications (e.g., Certified DevOps Professional, AWS/GCP architect certifications)

Responsibilities

  • Review system architecture and software components
  • Ensure SLOs and SLAs are met
  • Monitor operational metrics and lead improvement plans
  • Develop and maintain infrastructure as code
  • Manage and audit security controls to meet compliance standards
  • Implement and maintain compliance best practices
  • Lead strategic release planning including canary and blue-green deployments
  • Provision test environments and conduct performance tests
  • Lead incident response and conduct post-mortems
  • Develop and implement disaster recovery plans and fault injection simulations
  • Perform access onboarding and offboarding
  • Manage configuration and patching
  • Plan capacity to meet peak demand while optimizing cost
  • Create and maintain runbooks and technical documentation
  • Provide feedback and coaching to junior staff

Benefits

  • Remote work flexibility with optional office space in Malaysia and Singapore
  • Flexible working hours
  • Life and hospitalization insurance coverage
  • Hospitalization coverage for dependents
  • Virtual share options
  • Bonus (terms and conditions apply)
  • Monthly parking allowance (RM 150 or SGD 100)
  • Monthly meal allowance (RM 600 or SGD 400)
  • Annual learning allowance (USD 500, claim basis)
  • Social activity allowance (claim basis)
  • Annual company offsite

Skills

Apply Now
Senior Site Reliability Engineer, Compliance at CoinGecko | JobStash