Kubernetes is the most powerful platform for running cloud-native applications — and one of the easiest platforms to waste enormous amounts of money on. The flexibility that makes K8s great (any workload, any scale, any configuration) also makes it trivially easy to provision vastly more compute than your workloads actually need. Industry analysis from CNCF and Kubecost consistently shows that the average Kubernetes cluster has 40–60% of its allocated resources sitting idle — provisioned but unused.
Figma saved $2.1M/year by right-sizing their node pools. Datadog reduced their EKS bill by 38% through aggressive pod resource optimization. These are not outliers — they're the expected outcome of systematic K8s cost optimization in over-provisioned environments. This guide shows you exactly how to get there.
Why Kubernetes Waste Happens: The Over-Provisioning Problem
In Kubernetes, every container declares its resource requests (what the scheduler uses to place the pod on a node) and limits (the maximum the container can use). The node must have enough capacity to satisfy the sum of all pod requests on it — regardless of whether pods actually use those resources.
The classic over-provisioning pattern: a developer sets requests: cpu: 1000m, memory: 2Gi and limits: cpu: 2000m, memory: 4Gi without measuring actual usage. The application uses 100m CPU and 200Mi memory at peak. The pod takes up 10x more node capacity than it needs. Multiply this across 50 services and 200 pods and you're paying for 5–10x the compute you actually consume.
The fundamental K8s cost equation: You pay for node compute capacity (EC2 instances), not pod utilization. If your nodes are 30% utilized on average, you're paying 3x what you need to pay. Every optimization strategy below targets either increasing utilization (packing more pods per node) or reducing node size (matching capacity to actual workload needs).
Step 1: Measure Actual Resource Utilization
You can't rightsize what you can't measure. Start with Metrics Server and kubectl top:
# Install Metrics Server if not present
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Check actual CPU and memory usage per pod
kubectl top pods --all-namespaces --sort-by=cpu
NAMESPACE NAME CPU(cores) MEMORY(bytes)
production api-deployment-7d9b4-xkp2n 45m 187Mi
production worker-deployment-6c8f3-jkl9 12m 94Mi
staging api-deployment-5c7a1-pqr8 8m 156Mi
staging worker-deployment-9d2e4-mno3 3m 78Mi
# Compare actual usage to requests
kubectl get pods --all-namespaces -o json | \
jq '.items[] | {
name: .metadata.name,
namespace: .metadata.namespace,
cpu_request: .spec.containers[].resources.requests.cpu,
mem_request: .spec.containers[].resources.requests.memory
}'
For systematic analysis, Kubecost (free community edition) gives you a cluster-wide view of request vs actual utilization, broken down by namespace, deployment, and label. Install it with:
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost --create-namespace \
--set kubecostToken="your-token" \
--set prometheus.nodeExporter.tolerations[0].operator="Exists"
Step 2: Implement Vertical Pod Autoscaler (VPA)
The Vertical Pod Autoscaler automatically analyzes pod resource usage over time and recommends (or automatically sets) optimal requests values. In "Recommendation" mode, VPA shows you what it would set without changing anything — a safe way to see the opportunity before committing.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
namespace: production
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: api-deployment
updatePolicy:
updateMode: "Off" # "Off" = recommendations only, no auto-update
# Change to "Auto" once you trust the recommendations
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 50m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
After 24–72 hours, check VPA recommendations:
kubectl describe vpa api-vpa -n production
# Output shows:
# Container Recommendations:
# Container Name: api
# Lower Bound:
# Cpu: 45m
# Memory: 192Mi
# Target:
# Cpu: 125m ← VPA recommendation
# Memory: 256Mi ← vs your current 2Gi request
# Upper Bound:
# Cpu: 500m
# Memory: 512Mi
In the example above, VPA recommends 256Mi memory vs your current 2Gi request — an 87% reduction in memory allocation for this pod. Across a cluster, VPA recommendations typically reduce total requested resources by 30–60%, directly enabling you to run the same workloads on fewer or smaller nodes.
Step 3: Configure Horizontal Pod Autoscaler (HPA)
HPA scales the number of pod replicas based on CPU, memory, or custom metrics. The most common K8s cost mistake is running a static replica count (e.g., replicas: 10) that's sized for peak traffic 24/7. If your peak is 3 hours/day and you're running 10 replicas around the clock, you're paying for 10 replicas worth of capacity 21 hours/day unnecessarily.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-deployment
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60 # Target 60% CPU utilization per pod
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Percent
value: 25
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100
periodSeconds: 30
HPA + VPA compatibility: Running HPA and VPA together can cause conflicts when both try to modify the same deployment simultaneously. The safe pattern is: use HPA for replica scaling (horizontal) and VPA in "Off" mode (recommendation only) for resource right-sizing guidance. If you want automatic VPA + HPA, use KEDA (Kubernetes Event-Driven Autoscaling) as a replacement for HPA, which integrates with VPA cleanly.
Step 4: Karpenter for Node Group Optimization
Karpenter (AWS's next-generation node autoscaler, now used by GKE and Azure too) replaces the Cluster Autoscaler with a smarter approach: instead of scaling predefined node groups, Karpenter provisions exactly the right instance type for each pending pod's requirements. This flexibility dramatically reduces the gap between allocated and available capacity.
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64", "arm64"] # Allow Graviton for 20% savings
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"] # Prefer Spot
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"] # Compute, memory, general purpose
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
limits:
cpu: 1000
memory: 2000Gi
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s # Consolidate underutilized nodes quickly
The consolidationPolicy: WhenUnderutilized setting is key — Karpenter continuously evaluates whether pods can be bin-packed more efficiently and terminates underutilized nodes, replacing them with fewer, better-utilized nodes. This automatic consolidation typically saves an additional 10–20% beyond what manual rightsizing achieves.
Step 5: Address Unused PVCs and Idle Namespaces
Persistent Volume Claims (PVCs) in Kubernetes are backed by EBS volumes (on EKS) or equivalent cloud storage. PVCs created for StatefulSets, databases, and jobs often persist after the workload is deleted. Finding and cleaning them up:
# Find PVCs not mounted by any running pod
kubectl get pvc --all-namespaces | grep -v Bound
# More detailed: PVCs where the pod no longer exists
kubectl get pvc --all-namespaces -o json | \
jq '.items[] | select(.status.phase == "Released" or
(.metadata.annotations."volume.beta.kubernetes.io/storage-provisioner" != null and
.status.phase == "Bound"))' | \
jq '{name: .metadata.name, namespace: .metadata.namespace,
storage: .spec.resources.requests.storage}'
Also check for entire idle namespaces — development or feature-branch namespaces that were never cleaned up:
# Find namespaces with no pod activity in the last 30 days
kubectl get ns -o json | jq -r '.items[].metadata.name' | while read ns; do
pod_count=$(kubectl get pods -n "$ns" --no-headers 2>/dev/null | wc -l)
if [ "$pod_count" -eq 0 ]; then
echo "IDLE NAMESPACE: $ns"
fi
done
Real-World Results: What These Optimizations Deliver
Here's a realistic savings scenario for a team running a 20-node EKS cluster on m5.2xlarge instances in us-east-1:
- Current cost: 20 × m5.2xlarge × $0.384/hr = $7.68/hr = $5,530/month
- After VPA rightsizing: Workloads fit on 12 nodes → $3,318/month (40% savings)
- After Karpenter + Graviton: Mix of m7g.xlarge and m7g.2xlarge for $2,400/month
- After HPA (no idle replicas off-peak): Further 15% reduction → ~$2,040/month
- Total savings: $3,490/month ($41,880/year) from a cluster that "worked fine" before.
Optimize Your Entire AWS + Kubernetes Bill
Hero Savings covers your AWS infrastructure costs — EC2, RDS, S3, networking — with the same rigor you'd apply to K8s optimization. Get a complete picture of where your cloud money is going and exactly how to reduce it.
Start Free AWS Audit →Frequently Asked Questions
terminationGracePeriodSeconds appropriately for your application; (3) Use pod disruption budgets to ensure minimum availability during Spot interruptions; (4) Avoid stateful workloads (databases, ZooKeeper) on Spot nodes. For stateless API servers, workers, and batch jobs, Spot is safe and delivers 60–80% cost savings.