Senior MLOps Engineer – LLMOps

SeniorSalary: 215K - 230KUnited States Remote Full Time Ai Jobs by TRM Labs

About the Role

You will build and scale infrastructure for large language models and agentic systems. You will create CI/CD workflows for model training, evaluation, and deployment, automate model versioning and approval processes, and deploy scalable serving and observability stacks. You will monitor cost, latency, and performance, run offline and online evaluations, implement regression tests and human-in-the-loop workflows, and enable researchers with sandboxes, dashboards, and reproducible environments. You will continuously evaluate and integrate state-of-the-art LLM tools and enforce reliability, compliance, and governance.

Requirements

Write high-quality maintainable software primarily in Python
Strong background in containerization and orchestration such as Docker and Kubernetes
Experience with infrastructure-as-code and deployment tools such as Terraform and CI/CD pipelines
Experience with monitoring and logging frameworks such as Datadog Prometheus and OpenTelemetry
Knowledge of MLOps best practices including model versioning rollback strategies and drift detection
Experience with scalable model and agent serving infrastructure such as vLLM Triton and BentoML
Experience deploying and maintaining LLM and agentic workflows in production including cost latency and performance monitoring
Demonstrated ownership pragmatic engineering and measurable delivery

Responsibilities

Build reusable CI/CD workflows for model training evaluation and deployment
Automate model versioning approval workflows and compliance checks
Build modular and scalable AI infrastructure including vector databases feature stores and model registries
Integrate observability tooling and implement monitoring and logging
Embed AI models and agents into real-time applications and workflows
Evaluate and integrate state-of-the-art AI tools and libraries
Drive AI reliability governance and uptime
Improve AI and ML model performance
Ensure data accuracy consistency and reliability for training and inference
Deploy infrastructure for offline and online evaluation including regression testing and cost monitoring
Implement human-in-the-loop workflows
Provide sandboxes dashboards and reproducible environments for researchers

Benefits

Paid time off (PTO)
Holidays
Parental leave
Equity plan participation
Remote-first work arrangement

Senior MLOps Engineer – LLMOps

About the Role

Requirements

Responsibilities

Benefits

Skills

Similar Jobs