How do we Monitor N number of K8s cluster by implementing a centralized monitoring system through Thanos and Prometheus…

Ashish Ranjan
4 min readJun 11, 2021

Before going to architecture details, I would like to explain Thanos. what it is and why do we adopt it.

At, the first time when I heard about Thanos, I thought 💭 it would be something related to the marvel cosmetic universe 🐶 but after exploring I found that it’s a CNCF adopted monitoring tool that gives wings to our Prometheus server.

In nutshell, Prometheus uses SSD to store its relative metrics but when we integrate Thanos sidecar 🚗 with Prometheus cluster it also stores data as in the object form. we used an S3 bucket for storing metrics.

Architecture diagram

After using Thanos and centralized Prometheus server we achieved lot’s of things

  1. A centralized place to monitor all of our servers.
  2. Data loss prevention for Prometheus server(previously we used to have a common worker group for Prometheus server but sometimes we lose metrics when pods schedules to new az)
  3. data availability for a longer time.
  4. Resource sharing for all of our K8s servers and other ec2.

let’s discuss some of the tech stacks that would be used in upcoming paragraphs

  1. Thanos: It’s a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments.
  2. Prometheus: It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.
  3. Kubernetes: It’s a production-grade open-source container orchestration tool developed by Google to help you manage the containerized/dockerized applications supporting multiple deployment environments like On-premise, cloud, or virtual machines.
  4. Ec2: Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud.

As we had to monitor the N number of the K8s cluster then our first call was how would we export all of the metrics into a centralized place and what type of metric do we require for our monitoring.

after some debugging we categories metrics into the below category:

  1. control-plane metrics
  2. Worker node metrics
  3. Kubernetes Object metrics
  4. Application-level metrics
  5. External service metrics

Control plane monitoring:

In the Kubernetes Control place, we’ll get metrics that would be related to API server, ETCD, controller, and schedulers.
we decided to monitor below control plane components:

  1. api_server → though control plane static_endpoint(for monitoring this resource we’ll need to add extra Prometheus.rules that would be used on Grafana query)
  2. etcd_cluster → though control plane static_endpoint.
  3. core_dns → though service endpoints

Worker Node Monitoring:

Kubernetes worker nodes mainly deal with Kubelet and Kube proxy. Kubelet would be used for scheduling pods and Kube proxy for networking kinds of stuff. apart from the above Kube component, we require data from container level as well as from node levels that’s why we used node exporter and cadvisor.
we decided to monitor below node-level metrics:

  1. cadvisor → For containers mem/cpu info
  2. Kubelet → For Node level monitoring of Kube Jobs
  3. Node exporter → For worker nodes info

Kubernetes Object Monitoring:

Under this category we were considering all types of k8s level metrics such as HPA, number of replicas available, number of pods restart, number of deployment, number of PV, and other metrics.

we decided to use Kube state metrics exporter for doing this job also instead of exposing these metrics directed to Prometheus we used telegraf as a proxy. we’ll discuss this later in applications level monitoring why we used telegraf as a proxy.

Application Monitoring:
a most important component that would be used for application latency and
performance.

for application monitoring, we’ll need to tune our monitoring architecture
because each pod EKS uses secondary IP that is derived from ENI(elastic
network interface based on machine type) also if you didn’t specify service
Node port/Loadbalancer but your service would be accessible from within the cluster and its associated network(VPC peering/transit gateway) if you allowed a security group(optionally you can add restrictions on pods by using networking policy but that’s a different concept).

that’s why we decided to use some custom solution:

Architecture diagram: Application Monitoring

in our architecture, we’re using telegraf as a proxy that will scrap all metrics
from application and other dependent metrics after that it will expose all those metrics to our centralized Monitoring system.

External service monitoring:
Under this category of monitoring, we would take metrics from external services such as MSK Kafka cluster, self-hosted server, and other equivalent services.
we decided to monitor the below metrics

  1. MSK cluster
  2. Self-hosted MYSQL
  3. self-hosted Redis server
  4. Jenkins server metrics

The number of metrics for each cluster:

api_server: 19522
core_dns: 320
cadvisor_metrics: 1763* number of worker nodes
prom_node_exporter: 1220 * number of worker nodes
kubelet: 1349 * number of worker nodes
kube_state_metrics: 8237
msk_broker_jmx: 13488*2
msk_broker_node: 180*2
telegraf metrics: ~50k
currently we're running 7 worker nodes but it can go max 40 nodes
we would be scraping following number of metrics
1. 300k(when we will be using 40 worker nodes)
2. 220k(our normal workload when movies would be live)
3. 130k(current workload)

Sources:

  1. https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-node-exporter
  2. https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-state-metrics

Nothing more to see here….

By,
Ashish Ranjan

--

--

Ashish Ranjan

Building Highly scalable and reliable Infrastructure