Senior Software Engineer - Data Lakehouse Infrastructure

About the Role

You will design, implement, and scale core components of a petabyte-scale data lakehouse on GCP. You will build and optimize distributed query engines (Trino, Spark, Snowflake), implement metadata management with open table formats (Iceberg, Hudi), and develop robust ETL/ELT and streaming/batch pipelines using Airflow, Spark, and GCP-native tools. You will optimize query performance and data models, create observability and governance workflows, automate operational tasks, and collaborate with data scientists, backend engineers, and product managers to deliver reliable analytical data platforms.

Requirements

  • 5+ years of experience in data or software engineering focused on distributed data systems
  • Proven experience building and scaling data platforms on GCP
  • Strong experience with query engines such as Trino Presto Spark or Snowflake
  • Experience with table formats like Apache Hudi Iceberg or Delta Lake
  • Proficient programming skills in Python and strong SQL or SparkSQL ability
  • Hands-on experience with Airflow and building streaming and batch pipelines using GCP-native services
  • Experience with BigQuery GCS Dataproc Kafka or similar data infrastructure
  • Experience working at petabyte scale and optimizing analytical workloads

Responsibilities

  • Architect and scale a high-performance data lakehouse on GCP
  • Design build and optimize distributed query engines such as Trino Spark and Snowflake
  • Implement metadata management using open table formats and compatible catalogs
  • Develop and orchestrate ETL ELT and streaming batch pipelines with Airflow Spark and GCP services
  • Optimize query performance and data modeling for analytical workloads
  • Automate operational tasks and build self-serve scaling tools
  • Create observability dashboards and governance workflows
  • Collaborate with data scientists backend engineers and product managers

Benefits

  • Remote-first work
  • Paid time off
  • Paid holidays
  • Parental leave
  • Equity plan eligibility
  • Offsites and regional meetups

Skills

Apply Now
Senior Software Engineer - Data Lakehouse Infrastructure at TRM Labs | JobStash