Skip to main content

How to Choose MAX vs LAST vs AVG for Metric Rollups

  • April 29, 2026
  • 0 replies
  • 35 views

Emily Ashley

This is a practical walkthrough for choosing rollup functions like MAX, LAST, and AVG when you aggregate gauges from short sample intervals (for example, 2 seconds) into longer windows (for example, 30 seconds) with our Rollup Metrics Function.

We’ll use Cribl Internal Metrics as examples, but the reasoning applies to any gauges in your environment. 


1. Start with the question, not the function

Before you touch any rollup settings, ask: what question am I trying to answer with this gauge?

  • “Did this ever reach an interesting state in this window?”
    • Examples: “Did anything go red?” “Did throttling ever turn on?”
  • “What is it right now at the end of the window?”
    • Examples: “What’s the current health?” “What’s CPU right now?”
  • “What was it typically doing over this period?”
    • Examples: “What’s the average CPU over 5 minutes?” “What’s our normal queue size?”

2. Classify the gauge: three useful buckets

Next, decide what kind of gauge you’re dealing with. We’ll anchor in Cribl Internal Metrics and then generalize.

A. Ordinal health gauges

  • health.inputs
  • health.outputs

These usually use a small set of integer states, such as:

  • 0 = green
  • 1 = warning
  • 2 = trouble

They’re stateful gauges: the numeric value is really “which state are we in?” with an ordering. Fractional “in‑between” values like 0.75 health don’t mean much to a human.

General pattern: any gauge where the value is a discrete state (OK/WARN/TROUBLE, etc.) belongs here.

 

B. Binary / boolean gauges

  • throttle.engaged
  • blocked.outputs

These are 0/1 gauges that say whether something is happening or not.

General pattern: toggles, switches, feature flags, “is this on?”

 

C. Continuous gauges

  • system.cpu_perc
  • system.load_avg
  • system.mem_rss
  • pq.queue_size

These are numeric values where “in‑between” points are meaningful (52% CPU really is between 50% and 55%).

General pattern: resource usage, queue depths, latencies, utilization, and similar signals.

 

3. Choosing rollups for each gauge type

Now combine the question you’re asking with the type of gauge you have.

A. Ordinal health gauges – “what state did we reach?”

  • health.inputs
  • health.outputs

They use discrete states (0/1/2). In practice, people look at them to answer:

  • “How healthy was this over time?”
  • “Did we hit warning or trouble at any point in that period?”

A rollup of MAX lines up nicely with that mental model:

  • If health ever went to 2 (trouble) within the window, the rolled‑up value is 2.
  • If it only reached 1 (warning), the rolled‑up value is 1.
  • If it stayed at 0, the rolled‑up value is 0.

That preserves the highest severity reached in each window, which is exactly what most humans expect from a health graph.

General guidance for ordinal health gauges (any system):

  • If you care about “highest severity in the window,” use MAX.
  • Avoid AVG – intermediate values like 0.7 or 1.3 are hard to interpret.
  • Use LAST only when the specific need is “what state did we end in?”

You can extend this to other internal health‑style gauges (for example, a pq.health gauge that uses a similar ordinal scheme).

 

B. Binary / boolean gauges – “ever true” vs “currently true”

For Cribl‑style boolean gauges like throttle.engaged or blocked.outputs, there are two natural ways to read them:

  1. Ever true in the window?
    • Example: “Did throttling happen at all in the last N seconds?”
    • Here, MAX works well: if the gauge was ever 1, the rolled‑up value is 1.
  2. Currently true at the end of the window?
    • Example: “Is throttling on right now?”
    • Here, LAST makes more sense: you want the final observed state.

You can use AVG and read it as “percentage of samples that were true,” but that’s more of a specialized use case and should be intentional.

General guidance for boolean gauges:

  • “Ever happened in this window?” → lean MAX.
  • “Is it happening now?” → lean LAST.
  • “What fraction of time was it true?” → AVG, but only when you explicitly want that.

C. Continuous gauges – “now, typical, or worst?”

For continuous gauges like system.cpu_perc or pq.queue_size, the “right” rollup depends on what story you want your panel to tell.

Using Cribl Internal Metrics as concrete examples:

  • “What is it now?” views
    • Example: a dashboard tile showing current CPU or queue depth.
    • LAST is a natural choice: you want the most recent value at the end of the window.
  • “What’s typical over time?” views
    • Example: 5‑minute average CPU, usual queue length.
    • AVG is often a good fit: it gives a smoothed sense of “normal.”
  • “Did it ever spike?” views
    • Example: “Did CPU reach 95% or higher in this interval?”
    • MAX answers that directly by showing you the peak.

General guidance for continuous gauges:

  • Use LAST when you care most about the value at the end of the window (“right now”).
  • Use AVG when you want the typical level across the window.
  • Use MAX when you care about peaks and threshold crossings (“did it ever exceed X?”).

 

FINALE! Putting it all together with a checklist.

When you’re configuring rollups for Cribl Internal Metrics gauges—or any other gauges—this quick checklist tends to work well:

  1. What am I really asking about this gauge?
    • “Did it ever reach state X?” → lean MAX.
    • “What is it right now?” → lean LAST.
    • “What’s typical over this period?” → lean AVG (for continuous gauges).
  2. What type of gauge is it?
    • Ordinal health (like health.inputs/outputs)
      • Treat the values as states; MAX is usually a good fit for “highest severity reached.”
    • Boolean (like throttle.engaged)
      • MAX for “ever true,” LAST for “currently true”
      • AVG only for “fraction of time true” use cases
    • Continuous (like CPU, queue size)
      • Choose LAST/AVG/MAX based on whether you care about “now,” “typical,” or “worst.”
  3. Does the rolled‑up gauge tell the story I think it tells?
    • Look at a few real examples (including Cribl Internal Metrics) and do a gut-check that what’s drawn in the graph matches your mental picture of what happened during that time window.

If the story and the numbers line up, you’ve picked a good rollup for that gauge—and the same reasoning will keep working as you add more metrics over time.
 

P.S. A quick note on gauges vs counters in Cribl Internal Metrics

Cribl Internal Metrics include both gauges and counters. We’ve focused on gauges byt you may be wondering about Counters. Counters are rolled up with their own logic.

With Cribl Internal Metrics they are 
delta values per reporting interval. Generally when rolling up metrics, you can either sum those deltas or convert them to rates. In our Rollup Metrics implementation, we sum them within each time window.