Staff Platform Engineer, Observability
About the Role
You will design and run the systems behind metrics, logs, tracing, and performance debugging for product teams. You will architect pipelines that handle massive throughput, granular trace data, and millisecond-level introspection. You will lead the migration from Datadog to cost-efficient internal tools, partner with infra, product, and performance teams to support deep debugging workflows, and own reliability, scaling, and cost efficiency across the observability system. You will keep systems simple, fast, and easy for engineers to reason about.
Requirements
- 8+ years of strong programming experience with deep understanding of observability systems
- Experience with ClickHouse Loki Elasticsearch Prometheus or Grafana
- Ability to design systems that handle massive data volumes and sustained read write throughput
- Strong systems thinking and intuition for performance storage tradeoffs and reliability
- Experience running bare metal at scale (nice to have)
- Familiarity with Solana or blockchain performance needs (nice to have)
Responsibilities
- Architect a new observability stack across metrics logs tracing and alerting
- Build high-throughput observability pipelines optimized for fast low-latency querying
- Lead the migration from Datadog to cost-efficient performance-focused internal tools
- Partner with infrastructure product and performance teams to support deep debugging workflows
- Own reliability scaling and cost efficiency across the entire observability system
- Keep systems simple fast and easy for engineers to reason about
Benefits
- Remote-first flexible work (fully distributed)
- Market-leading salary
- Meaningful equity
- Generous vacation
- Wellness budgets
- Support for learning and travel
