Lead Software Engineer – ML & Agentic Workloads
About the Role
You will design, build, and scale systems that power agentic and intelligent workloads. You will lead production ML integrations from model selection and evaluation to deployment pipelines, implement guardrails for content safety and hallucination control, and create prompt lifecycle pipelines with versioning and CI/CD. You will build and optimize retrieval-augmented generation systems, configure vector databases and retrievers, define observability and evaluation metrics, and collaborate with other teams to design scalable APIs and services. You will mentor engineers and drive best practices for secure AI development and privacy-preserving data handling.
Requirements
- 8+ years of professional software engineering experience including 3+ years in ML application development or AI platform engineering
- Proficiency in Python and ML toolchains such as PyTorch and Hugging Face
- Experience with model evaluation, fine-tuning, and deployment across cloud and on-prem environments
- Hands-on experience with RAG architectures and vector databases (Weaviate, Milvus, pgvector, LanceDB, FAISS)
- Deep understanding of prompt design orchestration and versioning with CI/CD and automated testing
- Familiarity with agentic systems and visual-builder interfaces
- Knowledge of guardrail techniques including rule-based filters and policy evaluators
- Experience deploying ML systems on Kubernetes and serverless environments with observability tooling
- Solid understanding of API design, microservice architecture, and data pipeline integration
- Excellent communication and leadership skills
Responsibilities
- Lead architecture and development of agentic platforms
- Evaluate and deploy foundation and open-source models
- Design and maintain prompt lifecycle pipelines with version control and CI/CD
- Build and optimize retrieval-augmented generation systems
- Implement guardrail frameworks for content safety and hallucination control
- Integrate and extend agentic frameworks and visual orchestration tools
- Design scalable APIs and services for model-driven applications
- Define observability and evaluation metrics for model performance
- Drive secure AI development and privacy-preserving data handling
- Mentor engineers across ML backend and platform domains
