Elasticsearch

Elasticsearch slow log: the cheapest performance tool you're misconfiguring

Default thresholds — 10s warn, 1s info — never catch the queries actually hurting your cluster. Here's how to tune the slow log per-index, what query / fetch / index split actually means, and 3 incident patterns only the slow log surfaces cleanly.

Published 2026-04-23·10 min read

Elasticsearch’s slow log is the cheapest performance tool you have, and most teams configure it once at install and never touch it again. The default thresholds — 10s warn, 1s info — would never catch the queries actually hurting your cluster.

On this page

What the slow log captures
Thresholds that matter
Search slow log vs indexing slow log
3 patterns the slow log will show you
FAQ

What the slow log captures

Two separate logs live on each shard, written by the per-shard logger in the data node:

Search slow log — query-phase and fetch-phase timing. Logs the source (the query JSON), the took time, plus the index and shard.
Indexing slow log — per-document indexing time. Logs index name, document id, source (truncated), and took time.

Both are JSON-formatted in modern Elasticsearch and live at logs/<cluster>_index_search_slowlog.json and logs/<cluster>_index_indexing_slowlog.json. Ship them with Filebeat; don’t parse text logs.

Thresholds that matter

Defaults are far too loose. Set them per-index for the query patterns that matter to you, not at the cluster level:

PUT /events-2026-04/_settings
{
  "index.search.slowlog.threshold.query.warn":  "1s",
  "index.search.slowlog.threshold.query.info":  "500ms",
  "index.search.slowlog.threshold.query.debug": "200ms",
  "index.search.slowlog.threshold.fetch.warn":  "500ms",
  "index.search.slowlog.threshold.fetch.info":  "100ms",

  "index.indexing.slowlog.threshold.index.warn": "1s",
  "index.indexing.slowlog.threshold.index.info": "500ms",
  "index.indexing.slowlog.source": "1000"
}

Search slow log vs indexing slow log

Most teams configure search but ignore indexing. The indexing slow log is what surfaces:

Heavy mappings — a document with 800 fields takes longer than one with 12.
Refresh-interval pressure — fast indexing rates with a 1s refresh push translog flushing onto the indexing path.
Pipeline overhead — ingest pipelines (especially ones with grok, geo, or script processors) can dominate per-doc time on hot tier nodes.

3 patterns the slow log will show you

Three incident classes that only the slow log catches cleanly:

1. The deep-pagination scan

A query like from: 9000, size: 100 tells Elasticsearch to materialize 9100 hits per shard, then throw 9000 away. With 30 shards in the index, you’ve just scanned 273k documents to return 100. The slow log shows query times jumping past 1s while result counts stay tiny — a tell-tale sign you should be using search_after or PIT.

2. The wildcard-prefix attacker

A user types *foo* in a search box and the application passes it straight to a wildcard query. Elasticsearch can’t use the inverted index for leading-wildcard patterns, so it falls back to a per-term scan. Slow log times spike to many seconds. Block at the application layer; this never improves on the Elasticsearch side.

3. The scripted-field hotspot

A dashboard adds a script field — say, a tax calculation — to every hit. Now every search runs Painless once per matched document. The slow log shows steadily increasing query times as result sets grow. Move the calculation to ingest-time or to a runtime field with a narrower scope.

FAQ

Should I run Elasticsearch slow log in production at the lowest threshold?+

Yes. The cost is per-shard log writes, not query overhead — the timer always runs. Setting threshold.query.debug to 200ms gives you a continuous trail with no measurable cost.

Why are some slow log entries truncated?+

The source is capped (default 1KB for indexing) to avoid blowing up the log size. Bump it with index.indexing.slowlog.source if you need more — but be aware some shops disable source logging entirely for compliance.

Can I get aggregated query stats without the slow log?+

There's the search-monitoring API and Stack Monitoring, but those are aggregated. To diagnose a specific spike, you need the slow log's per-query detail.

What about OpenSearch?+

Same APIs, same log shape. The threshold settings, JSON format, and rotation are all backward-compatible because OpenSearch forked from ES 7.10.

Keep reading

Redis

Redis SLOWLOG: the misunderstood telemetry that catches half your incidents

Most teams ship Redis with default SLOWLOG settings and never look at it. Here's how to tune it, what to scrape from it, and the three Redis incident classes that only show up in SLOWLOG.

ClickHouse

ClickHouse in production: monitoring without becoming a query hot-spot yourself

system.query_log is huge. system.parts is huger. Here's what to actually scrape, what to throw away, and how to monitor a ClickHouse cluster without spending half its CPU on system queries.

Anomaly detection on database metrics: why thresholds fail and what works

A walk through forecast bands, change-point detection, multi-variate anomaly, and the seasonality math that makes 'p99 over 200ms' the wrong alert by default — with the Postgres example that broke our last threshold.

← All posts