project-nomad/docs/homelab/monitoring.md
2026-03-13 17:41:23 +00:00

6.9 KiB

Project N.O.M.A.D. — Monitoring Architecture

Overview

The Nomad Homelab Edition includes built-in monitoring capabilities and integrates with standard homelab monitoring stacks.

Monitoring Layers

┌─────────────────────────────────────────────────────────────┐
│                     Nomad Dashboard                          │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐      │
│  │ NAS      │ │ Server   │ │Container │ │ Network  │      │
│  │ Health   │ │ Metrics  │ │ Status   │ │ Devices  │      │
│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘      │
│       │            │            │            │              │
│  ┌────▼────────────▼────────────▼────────────▼─────┐       │
│  │              Nomad App (Aggregator)              │       │
│  └──────────────────────┬──────────────────────────┘       │
└─────────────────────────┼──────────────────────────────────┘
                          │
          ┌───────────────┼───────────────┐
          │               │               │
    ┌─────▼─────┐   ┌────▼────┐   ┌──────▼──────┐
    │  Agent 1  │   │ Agent 2 │   │   Agent N   │
    │ (Server)  │   │ (NAS)   │   │  (Remote)   │
    │  :9100    │   │ :9100   │   │   :9100     │
    └───────────┘   └─────────┘   └─────────────┘

Built-in Monitoring

System Resource Monitoring

The Nomad application provides built-in system information through the system controller:

  • CPU: Model, core count, usage
  • Memory: Total, used, free, usage percentage
  • Disk: Mount points, usage, filesystem types
  • Docker: Container status, image versions, resource usage

Docker Container Monitoring

Nomad monitors its own container stack and any Docker containers on the host via the Docker socket:

  • Container health status
  • Image versions and update availability
  • Resource consumption
  • Log access (via Dozzle integration in original install)

Health Endpoints

Endpoint Service Purpose
GET /api/health nomad-app Application health
GET /nginx-health nomad-nginx Reverse proxy health
GET /health nomad-agent Agent health

Agent-Based Monitoring

Architecture

The Nomad monitoring agent runs on remote homelab nodes and reports metrics via:

  1. Push model: Agent sends JSON metrics to POST /api/agent/report on the Nomad server
  2. Pull model: Prometheus scrapes the agent's /metrics endpoint

Agent Metrics

nomad_agent_cpu_usage_percent      - CPU utilization
nomad_agent_cpu_count              - CPU core count
nomad_agent_memory_total_bytes     - Total RAM
nomad_agent_memory_used_bytes      - Used RAM
nomad_agent_memory_usage_percent   - RAM utilization
nomad_agent_uptime_seconds         - System uptime
nomad_agent_docker_containers      - Docker container count
nomad_agent_docker_container_running - Per-container status

Deployment

Deploy an agent on each node you want to monitor:

docker run -d --name nomad-agent \
  --restart unless-stopped \
  -p 9100:9100 \
  -v /var/run/docker.sock:/var/run/docker.sock:ro \
  -v /proc:/host/proc:ro \
  -v /sys:/host/sys:ro \
  -e NOMAD_SERVER_URL=http://nomad-server:8080 \
  -e NODE_NAME=$(hostname) \
  nomad-agent

Prometheus Integration

Full Monitoring Stack

For a complete monitoring setup, add Prometheus and Grafana to your docker-compose:

# docker-compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:latest
    container_name: nomad-prometheus
    restart: unless-stopped
    ports:
      - "9090:9090"
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus-data:/prometheus
    networks:
      - nomad-internal

  grafana:
    image: grafana/grafana:latest
    container_name: nomad-grafana
    restart: unless-stopped
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    networks:
      - nomad-internal

volumes:
  prometheus-data:
  grafana-data:

Prometheus Configuration

# monitoring/prometheus.yml
global:
  scrape_interval: 30s
  evaluation_interval: 30s

scrape_configs:
  - job_name: 'nomad-agents'
    static_configs:
      - targets:
          - 'nomad-agent:9100'      # Local agent
          - 'server2:9100'           # Remote server
          - 'nas:9100'               # NAS agent

Node Exporter (Optional)

For deeper host-level metrics, add the Prometheus Node Exporter alongside the Nomad agent:

node-exporter:
  image: prom/node-exporter:latest
  container_name: nomad-node-exporter
  restart: unless-stopped
  ports:
    - "9101:9100"
  volumes:
    - /proc:/host/proc:ro
    - /sys:/host/sys:ro
    - /:/rootfs:ro
  command:
    - '--path.procfs=/host/proc'
    - '--path.sysfs=/host/sys'
    - '--path.rootfs=/rootfs'
    - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'

Observability

Structured Logging

The Nomad application uses structured JSON logging:

{
  "level": "info",
  "timestamp": "2026-03-13T12:00:00.000Z",
  "msg": "HTTP request completed",
  "method": "GET",
  "url": "/api/health",
  "status": 200,
  "duration": "12ms"
}

Configure log level via LOG_LEVEL environment variable: debug, info, warn, error.

Log Access

# Application logs
docker compose logs -f nomad-app

# Worker logs
docker compose logs -f nomad-worker

# All service logs
docker compose logs -f

# Nginx access logs (on host)
tail -f ${NOMAD_DATA_DIR}/logs/nginx/access.log

Alerting Recommendations

For homelab alerting, integrate with:

Tool Use Case
Uptime Kuma Service uptime monitoring
Grafana Alerting Metric-based alerts
Ntfy Push notifications
Gotify Self-hosted notifications

Example Uptime Kuma monitor:

  • URL: http://nomad-app:8080/api/health
  • Interval: 60 seconds
  • Expected status: 200