Senior ML Engineer – ML/Inference

Senior Anywhere Remote Ai Jobs by Marathon Digital Holdings

About the Role

You will lead the deployment, optimization, and lifecycle management of ML models for production inference and agentic systems. You will design and build scalable inference pipelines, implement model serving infrastructure for low latency and high throughput, and tune models using quantization, pruning, and distillation. You will develop and maintain RAG systems and vector database configurations, create CI/CD workflows for model updates, monitor model performance in production, detect drift, and improve reliability and explainability. You will benchmark and evaluate large language and multimodal models and integrate model APIs and agentic frameworks into customer-facing systems.

Requirements

5+ years of experience in applied ML or ML infrastructure engineering
Expertise in model serving and inference optimization (e.g., TensorRT ONNX vLLM Triton DeepSpeed)
Proficiency in Python and experience building APIs and pipelines with FastAPI PyTorch and Hugging Face tooling
Experience designing and tuning RAG systems and working with vector databases (Milvus Weaviate LanceDB pgvector)
Solid foundation in MLOps practices including versioning (MLflow DVC) orchestration (Airflow Kubeflow) and monitoring (Prometheus Grafana Sentry)
Familiarity with distributed compute systems (Kubernetes Ray Slurm) and cloud ML stacks (AWS SageMaker GCP Vertex AI Azure ML)
Understanding of prompt engineering agentic frameworks and LLM evaluation
Experience with quantization pruning and distillation techniques
Strong collaboration documentation and cross-functional communication skills
Preferred background in HPC ML infrastructure regulated or energy-aware environments

Responsibilities

Own the end-to-end lifecycle of ML models from training artifacts to production inference
Design build and maintain scalable inference pipelines using orchestration frameworks
Implement and optimize model serving infrastructure for latency throughput and cost
Develop and tune Retrieval-Augmented Generation systems and vector database configurations
Integrate model APIs and agentic workflows into customer-facing systems
Evaluate benchmark and optimize large language and multimodal models
Design CI/CD workflows for ML systems ensuring reproducibility and observability
Develop internal tools for dataset management feature stores and evaluation pipelines
Monitor production model performance detect drift and drive reliability improvements
Explore and integrate emerging agentic and orchestration frameworks

Senior ML Engineer – ML/Inference

About the Role

Requirements

Responsibilities

Skills

Similar Jobs