Principal Systems Engineer - IaaS/PaaS
About the Role
You will design and evolve core compute, storage, and orchestration systems for large-scale data and ML workloads. You will implement secure, reliable, and scalable Kubernetes platforms, including Operators, CI/CD pipelines, and IAM. You will collaborate with ML, product, and infrastructure teams to enable efficient data pipelines, feature stores, and training workflows across heterogeneous hardware. You will maintain platform reliability through observability, automation, and proactive performance optimization. You will define standards for deployment, validation, and operational readiness, and lead vendor evaluation and integration for key technologies.
Requirements
- 10+ years of software, systems, or data engineering experience and 3+ years in technical leadership or management
- Expertise in distributed systems, data streaming (Kafka, Flink, Spark), and ML orchestration (Airflow, Kubeflow, MLflow)
- Proficiency in Go and Python; hands-on experience with Kubernetes, Docker, and Infrastructure-as-Code tools (Terraform, Ansible)
- Experience with observability stacks (Prometheus, Grafana, ELK/FluentBit) and platform security
- Experience delivering data migrations, hybrid cloud architectures, and large-scale CI/CD automation
- Familiarity with modern data warehousing (Snowflake, Iceberg, Delta Lake) and vector databases (PgVector, Milvus, LanceDB)
- Track record of successful client delivery across cloud, media, industrial ML, or hardware integration
- Excellent communication, cross-team collaboration, and mentoring skills
- Preferred: Background in HPC, ML infrastructure, or sovereign/regulated environments
- Preferred: Familiarity with energy-aware computing, modular data centers, or ESG-driven infrastructure design
- Preferred: Experience collaborating with European and global engineering partners
Responsibilities
- Design and evolve core compute, storage, and orchestration systems for large-scale data and ML workloads
- Implement secure, reliable, and scalable Kubernetes platforms, including Operators, CI/CD pipelines, and IAM systems
- Collaborate with ML, product, and infrastructure teams to enable efficient data pipelines, feature stores, and training workflows on heterogeneous hardware
- Maintain platform reliability through observability, automation, and proactive performance optimization
- Define standards for deployment, validation, and operational readiness across environments
- Lead vendor evaluation and integration for technologies such as Kafka, Snowflake, MLflow, and Trino
- Foster open-source contribution, innovation, and continuous learning
