Principal Systems Engineer - IaaS/Hardware

About the Role

You will lead the design and evolution of IaaS platforms across hybrid and distributed environments. You will design and build infrastructure automation using Terraform Pulumi and Ansible and develop reusable modules and providers. You will define and enforce provisioning networking and observability standards and integrate platforms such as OpenShift Kubernetes MAAS and Ceph. You will establish GitOps and CI/CD practices with ArgoCD Helm GitHub Actions and Azure DevOps, drive capacity planning HA/DR and monitoring using Prometheus Grafana and Loki, and partner with security to embed OIDC SAML IAM and secret management best practices. You will collaborate with data and ML teams to align infrastructure with data pipelines MLflow and Airflow and mentor engineers through documentation workshops and enablement.

Requirements

  • 10+ years of experience designing and operating large-scale infrastructure systems across on-prem and cloud environments
  • Proven expertise in Infrastructure as Code Terraform Pulumi Ansible with experience authoring reusable modules and providers
  • Deep understanding of hybrid and private cloud platforms OpenShift Juju MAAS OpenStack VMware Proxmox
  • Strong background in storage Ceph TrueNAS S3 NFS and networking VLAN VXLAN SDN for high-availability architectures
  • Demonstrated experience building GitOps-based deployment pipelines and maintaining production-grade Kubernetes environments
  • Familiarity with data and ML infrastructure integration such as MLflow Airflow Databricks or Spark
  • Strong proficiency in Python Go and Bash for automation and platform tooling
  • Excellent cross-functional leadership communication and mentorship skills

Responsibilities

  • Architect and evolve the IaaS platform across hybrid environments
  • Design build and maintain infrastructure automation frameworks using Terraform Pulumi and Ansible
  • Develop reusable providers modules and infrastructure standards
  • Define and enforce engineering standards for provisioning networking and observability
  • Evaluate and integrate core technologies including OpenShift Kubernetes MAAS and Ceph
  • Drive multi-tenant PaaS initiatives and private cloud modernization
  • Collaborate with Data ML and Platform teams to align architecture with workloads
  • Establish GitOps and CI/CD frameworks for consistent infrastructure delivery
  • Lead capacity planning and HA/DR strategy and design monitoring and alerting
  • Embed zero-trust OIDC SAML IAM and secret management practices into the infrastructure lifecycle
  • Mentor engineers and contribute to technical enablement documentation and workshops

Skills

Apply Now
Principal Systems Engineer - IaaS/Hardware at Marathon Digital Holdings | JobStash