nanook-expr

The DSL behind every alert rule.

nanook-expr is the small predicate language that powers every [[alerts]] expr = "..." and every collector success = "..." field. PromQL crossed with a SQL WHERE clause, kept short.

Reference page. For worked examples see Alerts.

Mental model

An expression is a boolean built out of:

  1. A selector that picks one or more metric values.
  2. A comparison operator against a literal, range, set, or another selector.
  3. Optional boolean glue (&&, ||, !) and arithmetic (+ - * /).

The root must evaluate to a boolean. cpu.usage > 80% is a rule, cpu.usage on its own is not.

Selectors

A selector picks a metric. Full form:

[ AGG ( ] [ COLLECTOR :: ] METRIC [ { LABELS } ] [ ) ] [ [WINDOW] ] [ offset DURATION ]
  • METRIC is a dotted identifier path: cpu.usage, disk.free, http.status.
  • COLLECTOR :: qualifies which collector instance you mean. Skip it when there's only one collector emitting that metric.
  • { LABELS } filter on label values (see below).
  • AGG(...) aggregates across matching series.
  • [WINDOW] evaluates over a sliding window of past values rather than the latest sample.

Examples

# the simplest thing
cpu.usage

# qualified by collector instance
api::http.latency

# label filter
disk.usage{mount="/"}

# regex on labels
disk.usage{mount=~"^/(var|home)$"}

# avg of a windowed metric
avg(cpu.usage)[5m]

# sum across all cores
sum(cpu.core.usage)

Aggregators

OpWhat
avgmean of matching series
sumsum of matching series
maxlargest
minsmallest
p50 (alias median)50th percentile
p9090th percentile
p9595th percentile
p9999th percentile
stddevstandard deviation (population)
rateper-second slope across the window
deltalast sample minus first

When a selector matches multiple series (e.g. one per core, one per mount), an aggregator collapses them to one number.

Percentiles use nearest-rank on the live window's samples, sorted on each call.

rate and delta only mean something across time, so they expect a window. Without one they fall back to the mean of the cross-series fan-out, which is rarely what you want, so pair them with [5m] or similar.

Labels

metric{key="val"}     # exact match
metric{key!="val"}    # not equal
metric{key=~"regex"}  # regex match

Multiple labels are AND-ed.

Windows

Windows look back over recent samples and run the aggregator on the slice. Without a window, the selector reads the latest value.

avg(cpu.usage)[5m]      # 5-minute moving average
max(http.latency)[30s]  # worst latency in last 30s

Window units: s, m, h, d. A bare number is seconds. Sub-second resolution (ms) is not supported in window brackets.

Offset (compare to past)

Append offset <duration> after a selector to read its value from the past instead of right now. Pair with arithmetic or another selector to compare current vs historical.

# value 5 minutes ago
cpu.usage offset 5m

# spike check: now is 50% above the 5m moving avg from 1h ago
cpu.usage > 1.5 * avg(cpu.usage)[5m] offset 1h

# compare two windowed aggregates from different times
avg(http.latency)[5m] > avg(http.latency)[5m] offset 1d

The offset shifts the read backwards in time. With a [window], it reads the aggregate of the slice ending offset ago. Without a window, it reads the closest sample at-or-before now - offset.

The buffer needs offset + window worth of history. The first time a rule uses an offset selector, it starts from empty — readings only become available once enough samples have accumulated. Until then the rule errors with metric_not_found.

offset does not apply to absent or changed predicates.

Predicates

Predicates are boolean function-call forms over a selector. They answer "is the metric there?" or "did the metric move?" rather than thresholding a value, so they don't take a comparison operator on the right.

PredicateWhat it answersExample
absent(sel)true when no live sample matches sel (collector dead, metric never seen, label filter narrows to nothing)absent(cpu.usage)
changed(sel)[w]true when sel's value varied at all across the last [w] (flapping detection)changed(state.up)[5m]

absent does not take a window. changed requires one (it has no opinion without a time horizon).

Predicates participate in boolean combinators like any other condition:

absent(cpu.usage) || cpu.usage > 90%
changed(state.up)[5m] && uptime.seconds < 300

Rules using these predicates are re-evaluated on the periodic tick even when no fresh sample arrived, so collector death surfaces without an external trigger.

Subqueries: referencing other rules

firing("rule_name") is true when the named rule is currently fired. Use it to compose alerts where one condition depends on another rule's outcome.

# escalate when a base alert is firing AND the metric is climbing
firing("disk_high") && rate(disk.usage)[5m] > 0

# soft-warn when either the cpu rule fires or memory is low
firing("cpu_pegged") || mem.available < 100MB

The argument is the rule's name field. Write it as an unquoted identifier (firing(my-rule)) or a quoted string when the name has special characters (firing("name with spaces")).

The lookup reads a snapshot of firing state taken at the start of each evaluation pass. Rules see the previous pass's state for any rule evaluated within the same pass, so eval order stays independent of dependency edges.

Unknown names are a hard error at load (nanook::engine::unknown_rule_ref), anchored to the offending firing(...) call. Run nanook check to surface them before the agent starts.

Rules using firing(...) are re-evaluated on the periodic tick so they pick up state changes from upstream rules without needing a fresh sample of their own.

Comparison operators

Numeric:

OpMeaning
> < >= <=the usual
== !=equality

Set / range:

OpMeaningExample
inleft is in the sethttp.code in [200, 204, 301]
not inleft is not in the setiface not in ["lo", "docker0"]
betweenleft is in [lo, hi]http.code between [200, 399]
outsideleft is outside [lo, hi]temp.celsius outside [0, 80]

String (operates on selector values rendered as strings, useful for status flags):

OpMeaning
isexact match: http.status is "false"
notnot equal: name not "loopback"
containssubstring: name contains "error"
likeglob: name like "cpu.*"
matchesregex: name matches "^cpu\\."

Any of these can be negated by prefixing not: name not like "*.debug", path not matches "^/api", status not in [200, 204], cpu not > 50, etc. Double negation cancels.

Booleans and arithmetic

Combine predicates:

cpu.usage > 80% && mem.usage > 80%
disk.usage > 90% || disk.free < 1GB
!(http.status is "true")

Arithmetic in either side of a comparison:

mem.used + swap.used > 0.8 * mem.total
http.latency * 1000 > 500

Precedence is the standard one: * / bind tighter than + -, comparisons bind looser than arithmetic, ! binds tighter than && which binds tighter than ||. Parens are always available.

Literals

KindExample
number42, 3.14
percent90% (parses as 0.90)
bytes1GB, 512MB, 1KiB
duration5s, 2m, 1h30m
string"prod", 'prod'
number array[200, 204, 301]
string array["eth0", "wg0"]

Percent and unit literals are typed: cpu.usage > 90% is fine, cpu.usage > 90 is rejected by the type checker because it doesn't know what 90 means.

Filter expressions

A handful of fields (adapter filter = "...", collector filter = "...") take a slightly stripped-down form for filtering metrics by name and labels rather than thresholding. The pseudo-fields are:

  • name · full metric name (e.g. cpu.usage)
  • src · collector name
  • <key> · any label key (e.g. core, mount)

Same operators as above. Quick examples:

# adapter: only cpu and mem metrics
name like "cpu.*" || name like "mem.*"

# adapter: drop noisy debug metrics
name not like "*.debug"

# collector: scope to specific interfaces
iface like "eth*"

Grammar (informal)

expr     := or
or       := and ( "||" and )*
and      := not ( "&&" not )*
not      := "!"? cmp
cmp      := arith ( cmp_op rhs )?
rhs      := arith | string | array
arith    := unary ( ( "+" | "-" | "*" | "/" ) unary )*
unary    := "-"? primary
primary  := selector | predicate | rule_ref | literal | "(" expr ")"
rule_ref := "firing" "(" ( ident | string ) ")"

selector := agg? ( ident "::" )? metric labels? window? offset?
predicate := ( "absent" | "changed" ) "(" sel_inner ")" window?
offset   := "offset" duration
agg      := ( "avg" | "sum" | "max" | "min"
            | "p50" | "median" | "p90" | "p95" | "p99"
            | "stddev" | "rate" | "delta" ) "(" expr ")"
sel_inner := ( ident "::" )? metric labels?
metric   := ident ( "." ident )*
labels   := "{" label ( "," label )* "}"
label    := ident ( "=" | "!=" | "=~" ) string
window   := "[" duration "]"

cmp_op   := ">" | "<" | ">=" | "<=" | "==" | "!="
          | "in" | "between" | "outside"
          | "contains" | "matches" | "is" | "not" | "like"
          | "not" cmp_op                  # any op may be prefixed with `not`

See also