We are looking for a hands-on MLOps Engineer to own the full lifecycle of AI model deployment and infrastructure, from serving LLMs at scale to managing cloud and on-prem Kubernetes environments.
WHAT YOU WILL WORK ON:
Model deployment
- Host and serve LLMs using vLLM must-have
- Deploy ASR, transcription, and streaming models in production
- Optimize throughput, GPU utilization, and serve multiple models concurrently
- Benchmark and scale model serving pipelines
Containerization & orchestration
- Manage Kubernetes clusters — EKS and on-prem with kubeadm
- Work with Docker, Helm, and Flux CD for deployments
- Implement auto-scaling with KEDA, KServe, and Knative
- Handle network security and load balancing in K8s
Infrastructure & cloud
- Manage AWS infrastructure — S3, EKS, load balancers
- Use Terraform for infrastructure-as-code
- Configure GPU nodes — NVIDIA drivers, Fabric Manager, container exposure
- Linux administration and certificate management
Data & messaging
- Build async architectures using Kafka
- Work with Redis and SQL databases
Monitoring & CI/CD
- Monitor production systems with Datadog
- Maintain pipelines and services in Python
- Build CI/CD workflows using GitHub Actions
- Use Flyte for ML workflow orchestration and fine-tuning
- Profile and benchmark model performance
This is a hybrid role. Strong ownership mindset required, you will be hands-on across the full stack.