Metrics Are Not Enough: Monitoring Apache Kafka


Gwen Shapira (System Architect, Confluent)

Location: Grand Ballroom G

Date: Friday, May 4

Time: 10:00am - 10:50am

Pass Type: All Access, Conference

Format: Conference Session

Track: DevOps

Conference Journey: Software Developer/Engineer

Audience: Intermediate

Vault Recording: TBD

Audience Level: Intermediate

When you are running systems in production, clearly you want to make sure they are up and running at all times. But in a distributed system such as Apache Kafka… what does "up and running" even mean?

Experienced Apache Kafka users know what is important to monitor, which alerts are critical and how to respond to them. They don't just collect metrics - they go the extra mile and use additional tools to validate availability and performance on both the Kafka cluster and their entire data pipelines.

In this presentation we'll discuss best practices of monitoring Apache Kafka. We'll look at which metrics are critical to alert on, which are useful in troubleshooting and what may actually misleading. We'll review a few "worst practices" - common mistakes that you should avoid. We'll then look at what metrics don't tell you - and how to cover those essential gaps.

Presentation File