bryan/nix-configs

Fork 0

feat(monitoring): declarative monitoring stack with node_exporter and host dashboard #4

Merged

bryan merged 12 commits from feat/declarative-monitoring into main

2026-03-12 04:06:26 +00:00

bryan commented

2026-03-09 01:04:10 +00:00

Owner

Summary

Make the existing Prometheus + Grafana monitoring stack on studio actually useful. Previously both services were running via launchd but doing nothing — Prometheus only self-scraped with Homebrew's default config, Grafana had no datasource or dashboards, and TSDB storage was in volatile /tmp.

Changes

Declarative prometheus.yml — Generated from Nix attrsets via pkgs.writeText with 3 scrape targets (prometheus, grafana, node_exporter)
node_exporter — New launchd agent for macOS host metrics (CPU, memory, disk, network) on port 9100 with firewall rules
Grafana provisioning — Auto-provisions Prometheus datasource + "Studio - Host Metrics" dashboard on startup via GF_PATHS_PROVISIONING
macOS dashboard — Custom 15-panel Grafana dashboard tailored for macOS node_exporter metrics (avoids broken Linux-only panels from stock dashboards)
Persistent TSDB — Storage moved from /tmp/prometheus to ~/.prometheus/data with explicit 30-day retention
studio.nix — Added node_exporter to Homebrew brews

Files Changed

File	Change
`modules/services/monitoring.nix`	Enhanced: 103→206 lines. Declarative config generation, node_exporter agent, Grafana provisioning
`modules/services/dashboards/node-exporter-macos.json`	New: macOS host metrics dashboard (CPU, load, memory, disk, network, uptime)
`modules/hosts/studio.nix`	Added `"node_exporter"` to homebrew brews

Verification

nix flake check passes
All config generated from Nix (no manual files on studio needed)

Deploy

darwin-rebuild switch --flake '.#studio'

Then verify:

curl -sf localhost:9100/metrics | head -3      # node_exporter
curl -sf localhost:9090/api/v1/targets         # 3 targets UP
open http://localhost:3000                      # Grafana dashboard

## Summary Make the existing Prometheus + Grafana monitoring stack on studio actually useful. Previously both services were running via launchd but doing nothing — Prometheus only self-scraped with Homebrew's default config, Grafana had no datasource or dashboards, and TSDB storage was in volatile `/tmp`. ### Changes - **Declarative prometheus.yml** — Generated from Nix attrsets via `pkgs.writeText` with 3 scrape targets (prometheus, grafana, node_exporter) - **node_exporter** — New launchd agent for macOS host metrics (CPU, memory, disk, network) on port 9100 with firewall rules - **Grafana provisioning** — Auto-provisions Prometheus datasource + "Studio - Host Metrics" dashboard on startup via `GF_PATHS_PROVISIONING` - **macOS dashboard** — Custom 15-panel Grafana dashboard tailored for macOS node_exporter metrics (avoids broken Linux-only panels from stock dashboards) - **Persistent TSDB** — Storage moved from `/tmp/prometheus` to `~/.prometheus/data` with explicit 30-day retention - **studio.nix** — Added `node_exporter` to Homebrew brews ### Files Changed | File | Change | |------|--------| | `modules/services/monitoring.nix` | Enhanced: 103→206 lines. Declarative config generation, node_exporter agent, Grafana provisioning | | `modules/services/dashboards/node-exporter-macos.json` | New: macOS host metrics dashboard (CPU, load, memory, disk, network, uptime) | | `modules/hosts/studio.nix` | Added `"node_exporter"` to homebrew brews | ### Verification - `nix flake check` passes - All config generated from Nix (no manual files on studio needed) ### Deploy ```bash darwin-rebuild switch --flake '.#studio' ``` Then verify: ```bash curl -sf localhost:9100/metrics | head -3 # node_exporter curl -sf localhost:9090/api/v1/targets # 3 targets UP open http://localhost:3000 # Grafana dashboard ```

bryan added 1 commit

2026-03-09 01:04:10 +00:00

feat(monitoring): declarative monitoring stack with node_exporter and host dashboard 34c724f83c

bryan added 1 commit

2026-03-11 14:29:20 +00:00

fix(monitoring): add datasource uid and remove unused configFile option 3c3c38f303

bryan added 1 commit

2026-03-11 14:34:59 +00:00

docs(monitoring): update AGENTS.md for node_exporter, note Linux-only disk I/O panel 8567664d9e

bryan added 1 commit

2026-03-11 14:48:56 +00:00

fix(monitoring): persist grafana data outside homebrew cellar 2c757a37fc

bryan added 1 commit

2026-03-11 17:01:19 +00:00

install on studio 3b5afb8bcf

bryan added 1 commit

2026-03-11 17:29:14 +00:00

feat(monitoring): add loki, promtail, blackbox_exporter, SMTP alerting, and service health dashboard dec8dcfc7b

bryan added 2 commits

2026-03-12 03:02:49 +00:00

fix(monitoring): use nixpkgs derivation for blackbox_exporter instead of curl download 4ae6204b42

The activation script that downloaded blackbox_exporter to /usr/local/bin/ failed
silently due to permission denied on cp. Replace with pkgs.prometheus-blackbox-exporter
from nixpkgs — binary lives in Nix store, no permission issues, fully declarative.

feat(monitoring): add UNRAID NAS monitoring with Prometheus scraping, syslog ingestion, and Grafana dashboard 8a81997897

- Add UNRAID scrape jobs (node-exporter, cAdvisor) to studio host config
- Switch Prometheus from Homebrew to nixpkgs derivation
- Add Promtail syslog listener (UDP 1514, RFC 3164) for UNRAID log ingestion
- Add extraScrapeConfigs option for host-level Prometheus config extension
- Create UNRAID Grafana dashboard (system, network, storage, containers, logs)
- Move dashboards to modules/services/grafana/dashboards/

bryan added 1 commit

2026-03-12 03:18:40 +00:00

fix(monitoring): use container name labels now that cAdvisor Docker factory works 84fb9e3539

Replace label_replace short_id workaround with native name label
for container CPU and memory panels. Requires cAdvisor v0.56.2+
(ghcr.io/google/cadvisor) which fixes Docker factory registration.

bryan added 1 commit

2026-03-12 03:31:38 +00:00

feat(monitoring): add alerting rules, studio logs dashboard, and expanded health probes 4adc34b883

- Add Prometheus alerting rules: ServiceDown, HighCpuUsage, HighMemoryUsage, HighDiskUsage, PrometheusTargetDown
- Add Studio logs Grafana dashboard with per-service filtered panels and error log view
- Add Syncthing as blackbox health probe target
- Fix redirect warnings: use final URLs for Plex (/web/index.html) and UNRAID (/login)

bryan added 1 commit

2026-03-12 03:36:15 +00:00

fix(monitoring): adjust UNRAID dashboard layout — network/storage side-by-side, containers full-width 174626c74d

bryan added 1 commit

2026-03-12 03:58:28 +00:00

feat(monitoring): declarative Grafana SMTP config via custom.ini with file-based secret 57428a6dab