Senior Data Engineer
About the Role
You will own the entire data platform from inception and deliver working historical and real-time Tardis pipelines. You will spin up the cloud environment, implement CI/CD and infrastructure as code, and ensure pipelines are idempotent, retryable, and use exactly-once semantics. You will implement observability, provide self-service access to the MVP database via BI tools and internal APIs, ingest market and on-chain data sources, develop specialized timeseries datasets, plan for large data growth, establish data quality and governance, produce documentation and runbooks, and mentor junior data staff.
Requirements
- Proven track record of delivering production data rapidly and shipping 60-day MVPs
- Experience building Tardis historical and real-time pipelines or equivalent crypto market data feeds
- Expertise in large-scale reliable ETL and ELT for financial or market data
- Fluent in provisioning environments with Terraform and experienced with AWS or GCP serverless technologies
- Expert Python and SQL skills
- Experience with time-series databases such as TimescaleDB or ClickHouse
- Advanced knowledge of WebSocket clients message queue and low-latency streaming
- Experience with GitOps automated testing deployment and observability practices
- Significant understanding of stablecoins lending protocols and opportunity-surface concepts or ability to ramp up quickly
Responsibilities
- Spin up cloud environment and deliver historical backfill pipelines from Tardis into a queryable database
- Deliver a real-time Tardis WebSocket pipeline that is normalized cached replayable and queryable by Day 60
- Ensure pipelines are idempotent retryable and implement exactly-once semantics with CI/CD Terraform automated testing and secrets management
- Implement observability with structured logs metrics dashboards and alerting
- Provide self-service access to the MVP database via Tableau Metabase and internal REST APIs
- Develop specialized timeseries data including backing-asset and opportunity-surface timeseries
- Ingest data from Kaiko CoinAPI and on-chain sources via TheGraph and Dune
- Plan for 10x+ data growth through schema evolution partitioning and performance tuning
- Establish enterprise-grade governance including data quality framework RBAC audit logs and a semantic layer
- Create architecture documentation runbooks and a data dictionary
- Onboard and mentor junior data staff
Benefits
- Flexible remote-friendly work environment
Skills
Data QualityKaikoThegraphPartitioningVendor ApiSemantic LayerTimeseriesSchema EvolutionAudit TrailDuneRest ApiServerlessTableauRbacWebsocketMetabaseGitopsTerraformTestingMarket DataObservabilityPerformance TuningAwsCi/CdGcpStreamingClickhouseSqlPythonTardisTimescaledbMessage QueueEltData GovernanceAudit LogsCoinapiEtl
