feat(monitoring): declarative monitoring stack with node_exporter and host dashboard #4

Merged
bryan merged 12 commits from feat/declarative-monitoring into main 2026-03-12 04:06:26 +00:00
Owner

Summary

Make the existing Prometheus + Grafana monitoring stack on studio actually useful. Previously both services were running via launchd but doing nothing — Prometheus only self-scraped with Homebrew's default config, Grafana had no datasource or dashboards, and TSDB storage was in volatile /tmp.

Changes

  • Declarative prometheus.yml — Generated from Nix attrsets via pkgs.writeText with 3 scrape targets (prometheus, grafana, node_exporter)
  • node_exporter — New launchd agent for macOS host metrics (CPU, memory, disk, network) on port 9100 with firewall rules
  • Grafana provisioning — Auto-provisions Prometheus datasource + "Studio - Host Metrics" dashboard on startup via GF_PATHS_PROVISIONING
  • macOS dashboard — Custom 15-panel Grafana dashboard tailored for macOS node_exporter metrics (avoids broken Linux-only panels from stock dashboards)
  • Persistent TSDB — Storage moved from /tmp/prometheus to ~/.prometheus/data with explicit 30-day retention
  • studio.nix — Added node_exporter to Homebrew brews

Files Changed

File Change
modules/services/monitoring.nix Enhanced: 103→206 lines. Declarative config generation, node_exporter agent, Grafana provisioning
modules/services/dashboards/node-exporter-macos.json New: macOS host metrics dashboard (CPU, load, memory, disk, network, uptime)
modules/hosts/studio.nix Added "node_exporter" to homebrew brews

Verification

  • nix flake check passes
  • All config generated from Nix (no manual files on studio needed)

Deploy

darwin-rebuild switch --flake '.#studio'

Then verify:

curl -sf localhost:9100/metrics | head -3      # node_exporter
curl -sf localhost:9090/api/v1/targets         # 3 targets UP
open http://localhost:3000                      # Grafana dashboard
## Summary Make the existing Prometheus + Grafana monitoring stack on studio actually useful. Previously both services were running via launchd but doing nothing — Prometheus only self-scraped with Homebrew's default config, Grafana had no datasource or dashboards, and TSDB storage was in volatile `/tmp`. ### Changes - **Declarative prometheus.yml** — Generated from Nix attrsets via `pkgs.writeText` with 3 scrape targets (prometheus, grafana, node_exporter) - **node_exporter** — New launchd agent for macOS host metrics (CPU, memory, disk, network) on port 9100 with firewall rules - **Grafana provisioning** — Auto-provisions Prometheus datasource + "Studio - Host Metrics" dashboard on startup via `GF_PATHS_PROVISIONING` - **macOS dashboard** — Custom 15-panel Grafana dashboard tailored for macOS node_exporter metrics (avoids broken Linux-only panels from stock dashboards) - **Persistent TSDB** — Storage moved from `/tmp/prometheus` to `~/.prometheus/data` with explicit 30-day retention - **studio.nix** — Added `node_exporter` to Homebrew brews ### Files Changed | File | Change | |------|--------| | `modules/services/monitoring.nix` | Enhanced: 103→206 lines. Declarative config generation, node_exporter agent, Grafana provisioning | | `modules/services/dashboards/node-exporter-macos.json` | New: macOS host metrics dashboard (CPU, load, memory, disk, network, uptime) | | `modules/hosts/studio.nix` | Added `"node_exporter"` to homebrew brews | ### Verification - `nix flake check` passes - All config generated from Nix (no manual files on studio needed) ### Deploy ```bash darwin-rebuild switch --flake '.#studio' ``` Then verify: ```bash curl -sf localhost:9100/metrics | head -3 # node_exporter curl -sf localhost:9090/api/v1/targets # 3 targets UP open http://localhost:3000 # Grafana dashboard ```
The activation script that downloaded blackbox_exporter to /usr/local/bin/ failed
silently due to permission denied on cp. Replace with pkgs.prometheus-blackbox-exporter
from nixpkgs — binary lives in Nix store, no permission issues, fully declarative.
- Add UNRAID scrape jobs (node-exporter, cAdvisor) to studio host config
- Switch Prometheus from Homebrew to nixpkgs derivation
- Add Promtail syslog listener (UDP 1514, RFC 3164) for UNRAID log ingestion
- Add extraScrapeConfigs option for host-level Prometheus config extension
- Create UNRAID Grafana dashboard (system, network, storage, containers, logs)
- Move dashboards to modules/services/grafana/dashboards/
Replace label_replace short_id workaround with native name label
for container CPU and memory panels. Requires cAdvisor v0.56.2+
(ghcr.io/google/cadvisor) which fixes Docker factory registration.
- Add Prometheus alerting rules: ServiceDown, HighCpuUsage, HighMemoryUsage, HighDiskUsage, PrometheusTargetDown
- Add Studio logs Grafana dashboard with per-service filtered panels and error log view
- Add Syncthing as blackbox health probe target
- Fix redirect warnings: use final URLs for Plex (/web/index.html) and UNRAID (/login)
bryan force-pushed feat/declarative-monitoring from 57428a6dab to 3ad3beebfb 2026-03-12 04:01:39 +00:00 Compare
bryan merged commit edea3e3de2 into main 2026-03-12 04:06:26 +00:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
bryan/nix-configs!4
No description provided.