Kubernetes at Scale#

March 15, 2026 · 8 min read

After three years of running 200+ microservices across multiple Kubernetes clusters, I have accumulated a collection of hard-won lessons. Most of them came from production incidents at 2 AM. This post distills the patterns and pitfalls I have encountered, so you can avoid repeating them.

Resource Limits Are Not Optional

The single most impactful thing you can do for cluster stability is enforce resource requests and limits on every pod. Without them, a single misbehaving service can starve the entire node. We learned this the hard way when a memory leak in a logging sidecar took down 14 services on a shared node.

Here is the pattern we settled on:

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

A few rules we follow: requests should reflect actual steady-state usage (check your Prometheus metrics over 7 days), and limits should be 2-4x the request to handle bursts. We use LimitRange objects on every namespace to enforce defaults, so nothing slips through without explicit resource declarations.

HPA Gotchas That Will Bite You

The Horizontal Pod Autoscaler sounds simple on paper: set a CPU target, and it scales pods up and down. In practice, there are several traps. First, HPA uses requests as the baseline, not limits. If your request is 100m and your target is 80%, HPA will scale when average CPU hits 80m across pods. If your request is artificially low, you will scale too aggressively.

Second, the default stabilization window is 5 minutes for scale-down. During a traffic spike, you will scale up fast but scale down slowly. This is usually what you want, but for batch workloads you should tune behavior.scaleDown.stabilizationWindowSeconds:

behavior:
  scaleDown:
    stabilizationWindowSeconds: 120
    policies:
    - type: Percent
      value: 25
      periodSeconds: 60

Third, if you are using custom metrics from Prometheus via the prometheus-adapter, make sure your PromQL query returns a per-pod value, not an aggregate. I have seen teams accidentally scale on total request count instead of per-pod request count, which causes a feedback loop of infinite scaling.

Network Policies: Default Deny First

Most clusters I have audited have zero network policies, meaning every pod can talk to every other pod. This is a ticking time bomb. Start with a default deny policy on every namespace:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

Then explicitly allow the traffic you need. Yes, this is more work upfront. But when a compromised pod tries to lateral-move across your cluster, you will be glad the blast radius is contained. We reduced our attack surface by roughly 90% just by implementing namespace-level network policies.

Pod Disruption Budgets Save Deployments

If you are running any workload that needs high availability, you need PodDisruptionBudgets. Without them, a node drain during a cluster upgrade can kill all replicas of a service simultaneously. We set a PDB on every production deployment:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: "50%"
  selector:
    matchLabels:
      app: api-server

Use minAvailable rather than maxUnavailable when you want to guarantee a minimum number of pods are always running. For services with only 2 replicas, set minAvailable: 1 so at least one pod survives any voluntary disruption.

Monitoring with Prometheus: What Actually Matters

With 200+ services, you can easily drown in metrics. We focus on four golden signals per service:

Latency — p50, p95, p99 of request duration
Traffic — requests per second
Errors — 5xx rate as a percentage of total traffic
Saturation — CPU and memory utilization against limits

We run Prometheus with Thanos for long-term storage and cross-cluster querying. The key insight is to keep your cardinality under control. A single label with unbounded values (like user_id) can blow up your Prometheus memory. We enforce a cardinality limit of 10,000 unique series per metric name using recording rules and alerts on prometheus_tsdb_head_series.

Lessons from the Trenches

After all these years, a few principles have emerged:

Treat your cluster configuration as code. Every YAML file goes through code review.
Use namespaces as hard boundaries, not just organizational labels. Pair them with ResourceQuotas, LimitRanges, and NetworkPolicies.
Never run stateful workloads on Kubernetes unless you have a dedicated team to manage the operators. StatefulSets are powerful but operationally complex.
Invest in a proper service mesh early. We adopted Linkerd over Istio for its simplicity, and it gave us mTLS, retries, and observability with minimal overhead.

The goal is not to make Kubernetes simple. The goal is to make it predictable. Every surprise in production is a missing guardrail.

Kubernetes at scale is not about mastering every API object. It is about building layers of safety — resource limits, network policies, PDBs, proper monitoring — so that when something fails (and it will), the blast radius is small and recovery is fast.