Senior Software Engineer - Data Lakehouse Infrastructure
About the Role
You will design, implement, and scale core components of a petabyte-scale data lakehouse on GCP. You will build and optimize distributed query engines (Trino, Spark, Snowflake), implement metadata management with open table formats (Iceberg, Hudi), and develop robust ETL/ELT and streaming/batch pipelines using Airflow, Spark, and GCP-native tools. You will optimize query performance and data models, create observability and governance workflows, automate operational tasks, and collaborate with data scientists, backend engineers, and product managers to deliver reliable analytical data platforms.
Requirements
- 5+ years of experience in data or software engineering focused on distributed data systems
- Proven experience building and scaling data platforms on GCP
- Strong experience with query engines such as Trino Presto Spark or Snowflake
- Experience with table formats like Apache Hudi Iceberg or Delta Lake
- Proficient programming skills in Python and strong SQL or SparkSQL ability
- Hands-on experience with Airflow and building streaming and batch pipelines using GCP-native services
- Experience with BigQuery GCS Dataproc Kafka or similar data infrastructure
- Experience working at petabyte scale and optimizing analytical workloads
Responsibilities
- Architect and scale a high-performance data lakehouse on GCP
- Design build and optimize distributed query engines such as Trino Spark and Snowflake
- Implement metadata management using open table formats and compatible catalogs
- Develop and orchestrate ETL ELT and streaming batch pipelines with Airflow Spark and GCP services
- Optimize query performance and data modeling for analytical workloads
- Automate operational tasks and build self-serve scaling tools
- Create observability dashboards and governance workflows
- Collaborate with data scientists backend engineers and product managers
Benefits
- Remote-first work
- Paid time off
- Paid holidays
- Parental leave
- Equity plan eligibility
- Offsites and regional meetups
