Hi there, I’m Nitesh Chauhan 👋

Platform Engineer and SRE with 11+ years of experience building and operating large-scale distributed infrastructure.

Currently leading the platform engineering function at Sportserve — a multi-cluster Kubernetes estate serving 200+ engineers across 15 teams, with observability infrastructure processing 1M+ Prometheus samples/sec and 15M active time series.

Before that, 7+ years across AWS, GCP, and OCI — leading infrastructure for Saudi Arabia’s largest government e-invoicing platform, a data science platform, and enterprise CI/CD at scale.

I care about platform engineering done right: self-service developer tooling, SLO-driven reliability culture, and infrastructure that gets out of engineers’ way.

Here, I write about platform engineering, kubernetes, automation, and building infrastructure at scale.

From $25K to $5K/Month: Running a Production Elasticsearch Cluster on Kubernetes

1. The Problem In early 2022, we were running our entire observability stack on Elastic Cloud. It was the obvious choice at the time — managed, hands-off, no operational burden. Except it wasn’t hands-off at all. The cluster was regularly falling over. OOM kills on data nodes, indexing rejections during peak load, CPU spiking to saturation with no clear root cause. We were opening support tickets with Elastic, waiting days for responses, and getting back generic JVM tuning suggestions that changed nothing. Meanwhile the bill sat at $25,000 per month. ...

June 11, 2026