Configuration

Anatomy of nanook.toml · every section, what it does, what's optional.

nanook.toml is the source of truth. Everything the agent does comes from here.

Skeleton

[log]            # logging level + format
[admin]          # admin server (HTTP + WS + Unix socket)
[plugins]        # where to look for dylib plugins
[channels.<id>]  # alert delivery destinations
[[collectors]]   # one entry per metric source
[[adapters]]     # one entry per metric sink
[[alerts]]       # rules that fire on metric conditions

You don't need every section. The minimum useful config has one collector and one rule.

`[log]`

[log]
level  = "info"   # debug, info, warn, error
format = "text"   # text, json

`[admin]`

The admin server is what nanook ctl and nanook tui talk to. Disabled by default: set enabled = true to expose it.

[admin]
enabled         = true
addr            = "127.0.0.1:9091"             # HTTP/WS bind addr
socket          = "/run/nanook.sock"            # optional Unix socket
auth            = "required"                   # "required" (default) or "none"
authorized      = [                            # inline ssh-ed25519 lines
  "ssh-ed25519 AAAAC3Nz... alice@laptop",
]
authorized_keys = "/etc/nanook/authorized_keys" # optional path to a key file
cert            = ""                           # reserved, TLS not yet wired
key             = ""                           # reserved, TLS not yet wired

Clients sign every request with an Ed25519 SSH-style key. The server gates them against authorized (inline) or authorized_keys (file). With auth = "required" the agent refuses to start when no keys are configured; set auth = "none" to opt out on trusted networks.

TLS isn't wired yet: front the agent with a reverse proxy for encryption on the wire, or use the Unix socket on a single host. See Admin server for key generation, headers, and error codes.

`[state]`

[state]
path = "/var/lib/nanook/state.bin"

Where the agent persists in-flight state across restarts. Empty path (the default) means in-memory only: silences disappear when the process exits.

The file is postcard-encoded, not meant for hand-editing. A missing file at boot is a clean first run; a corrupt or version-incompatible file logs a warning and the agent boots empty.

Today only silences are persisted. Rule cooldowns, hit counters, and plugin state will follow on the same plumbing.

`[engine]`

Engine-level tuning. Today scopes only the cardinality guard.

[engine.cap]
source = 10000   # max distinct series per source. 0 disables.
metric = 2000    # max distinct series per (source, name). 0 disables.

Both caps are enforced at the ingest boundary in LatestStore::insert. Crossing either rejects the new series — existing ones keep updating. The reject is recorded on stats.dropped and exported as the self-metric nanook.engine.dropped, and a rate-limited warn line is logged once per (source, metric) per minute.

The defaults are intentionally high. Hitting them almost always means a label is unbounded, not that the cap is too tight. See nanook::engine::cardinality_exceeded.

`[plugins]`

[plugins]
dirs     = ["/usr/lib/nanook", "./plugins"]
allowed  = [
  "sha256:d3983dbf3dbc30aa4838bb16c38786d03af5faf2643db2a0cdab7444697ff28b",
]
verify   = true   # default true: refuse plugins whose digest isn't in `allowed`
strict   = true   # default true: refuse dirs not owned by us or world-writable

See Plugins for how plugins get discovered and validated.

`[channels.<id>]`

A channel is a named delivery target. Rules reference it by id.

[channels.log]
type = "log"

[channels.ops]
type = "slack"
[channels.ops.opts]
url = "${SLACK_WEBHOOK_URL}"

[channels.shell]
type = "exec"
[channels.shell.opts]
cmd = "/usr/local/bin/page-oncall.sh"

Channel types: log, webhook, discord, slack, exec. See Channels for what each one accepts.

`[[collectors]]`

A collector is anything that produces metrics. Built-in kinds: cpu, mem, disk, net, http, tcp, dns, process, exec, load, temp, uptime. Plugin kinds are loaded from [plugins].dirs.

[[collectors]]
name     = "cpu"            # your handle for this collector
kind     = "cpu"            # defaults to `name` if omitted
interval = "5s"             # how often to poll
labels   = { env = "prod" } # constant labels merged onto every sample
filter   = ""               # nanook-expr filter; samples that don't match are dropped
plugin   = ""               # explicit plugin crate (rare)
[collectors.opts]
# kind-specific options (see `nanook doc <kind>`)

Two collectors can share a kind, give them different names.

[[collectors]]
name = "api"
kind = "http"
interval = "30s"
[collectors.opts]
url = "https://api.example.com/healthz"

[[collectors]]
name = "homepage"
kind = "http"
interval = "60s"
[collectors.opts]
url = "https://example.com"

Reference each one in expressions as api::http.status and homepage::http.latency. See nanook-expr.

`[[adapters]]`

An adapter exports metrics to an external system. It runs continuously and sees every metric the engine produces.

[[adapters]]
name   = "scrape"             # handle for this adapter
kind   = "prometheus"         # defaults to `name` if omitted
filter = ""                   # nanook-expr filter on metric name / labels
plugin = ""                   # explicit plugin crate (rare)
[adapters.opts]
port = 9090
ttl  = 300

Built-in adapter kinds: prometheus, statsd, dogstatsd, pulse, file, stdout, webhook. See Adapters.

`[[alerts]]`

[[alerts]]
expr     = "cpu.usage > 90%"  # nanook-expr predicate (required)
count    = 3                  # ticks the predicate must hold
channel  = "ops"              # channel id, must exist in [channels.<id>]
action   = "log"              # log | webhook | discord | slack | exec (default log)
target   = ""                 # action-specific target (URL, command, ...)
body     = ""                 # template rendered against the alert event; see Alerts
cooldown = "5m"               # min gap between fires for this rule
escalate = { after = "10m", action = "exec", target = "/usr/local/bin/page" }

See Alerts for everything [[alerts]] accepts and how escalation, cooldown, and silencing interact.

Environment substitution

Anywhere a string appears, ${VAR} is substituted from the process env. Unset vars are an error at config load. Use a default after a colon for optional vars:

url      = "${SLACK_WEBHOOK_URL}"          # required, fails load if unset
fallback = "${SLACK_WEBHOOK_URL:none}"     # literal `none` if unset
empty    = "${OPTIONAL:}"                  # empty string if unset

The default literal is taken verbatim, whitespace included.

To pass a literal $ (or ${VAR}) past env-subst (so a downstream layer like the shell or nanook-templates can do its own substitution), double the dollar sign:

literal = "$$VAR"          # rewrites to `$VAR`
shell   = "echo $${HOME}"  # rewrites to `echo ${HOME}`, sh expands at exec

$$ collapses to $ once; env-subst never re-scans its own output.

File includes

Any opts string can be read from disk by prefixing it with @file:<path>. The file's contents replace the value at config-load time. Works for anything string-shaped: shell scripts, body templates, multi-line URLs, cert blobs.

[channels.shell]
type = "exec"
[channels.shell.opts]
cmd = "@file:./scripts/notify.sh"

[[alerts]]
expr    = "cpu.usage > 90%"
channel = "ops"
body    = "@file:./templates/cpu_alert.tpl"

Paths resolve relative to the agent's working directory, not the location of nanook.toml. Use absolute paths if you care.

The file is read once at load time. Edits to the included file don't propagate until you reload the config.

Schema

nanook schema emits a JSON Schema (draft 2020-12) describing the agent config. Pipe it to a file and point your editor at it for autocomplete and inline validation.

nanook schema > nanook.schema.json

Configuration

Skeleton

[log]

[admin]

[state]

[engine]

[plugins]

[channels.<id>]

[[collectors]]

[[adapters]]

[[alerts]]