Why We Chose Nomad Over Kubernetes
The Question Everyone Asks
When people hear we run production agent workloads without Kubernetes, the reaction is usually some version of “why?” Kubernetes is the default. It’s where the ecosystem is. It’s what everyone knows.
That’s exactly the problem.
Agent Architectures Are Already Complex
An agent-native product has a lot of moving parts before you add orchestration. You’ve got MCP servers exposing tool interfaces, orchestration services coordinating agent workflows, monitoring for agent behavior, guardrail services validating outputs, and the actual business logic underneath all of it.
Kubernetes adds its own layer of complexity on top: custom resource definitions, operators, service meshes, ingress controllers, RBAC policies, and a control plane that needs its own care and feeding. For a team of 18 engineers, that’s a significant operational tax.
What Nomad Gives Us
Nomad does one thing well: it schedules workloads. That’s it. No opinions about service discovery (we use Consul), no opinions about secrets management (we use Vault), no opinions about networking (we handle that ourselves).
job "mcp-invoice-server" {
type = "service"
group "mcp" {
count = 3
scaling {
min = 1
max = 10
policy {
source = "prometheus"
query = "avg(mcp_tool_call_queue_depth)"
}
}
task "server" {
driver = "docker"
config {
image = "registry.kodecraft.dev/mcp-invoice:latest"
}
resources {
cpu = 500
memory = 256
}
}
}
}
This job definition is the entire deployment configuration for one of our MCP servers. It auto-scales based on tool call queue depth—the metric that actually matters for agent workloads. Try doing this in Kubernetes and count how many YAML files you end up with.
The Operational Difference
Nomad is a single binary. It runs as a server and as a client. The entire cluster state fits in memory. When something goes wrong—and things always go wrong—the debugging surface is small enough that one engineer can hold the whole system in their head.
We’ve had incidents where an agent triggered a cascade of tool calls that overwhelmed an MCP server. With Nomad, the response was straightforward: check the allocation logs, see the resource limits being hit, adjust the scaling policy, redeploy. The whole cycle took minutes, not hours.
With Kubernetes, the same incident would involve checking pod logs across namespaces, verifying HPA configurations, checking if the cluster autoscaler was responsive, and potentially debugging scheduler decisions across multiple node pools.
What We Give Up
Kubernetes has a larger ecosystem. If you want a managed service mesh, a GitOps controller, or a policy engine, there’s a Kubernetes operator for it. Nomad’s ecosystem is smaller.
We accept this trade-off because the things we need—service discovery, secrets management, scheduling, and scaling—are all handled well by the HashiCorp stack. We don’t need an operator marketplace. We need infrastructure that stays out of the way while agents do their work.
The Decision Framework
If you’re a large organization with dedicated platform teams, Kubernetes might be the right choice. If you’re a small team building agent-native products and you want to spend your time on the product rather than the platform, Nomad deserves serious consideration.
We chose Nomad because agent architectures already have enough moving parts. The orchestrator shouldn’t be one of them.