mirror of
https://github.com/Crosstalk-Solutions/project-nomad.git
synced 2026-03-31 22:19:25 +02:00
226 lines
6.9 KiB
Markdown
226 lines
6.9 KiB
Markdown
# Project N.O.M.A.D. — Monitoring Architecture
|
|
|
|
## Overview
|
|
|
|
The Nomad Homelab Edition includes built-in monitoring capabilities and integrates with standard homelab monitoring stacks.
|
|
|
|
## Monitoring Layers
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Nomad Dashboard │
|
|
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
|
│ │ NAS │ │ Server │ │Container │ │ Network │ │
|
|
│ │ Health │ │ Metrics │ │ Status │ │ Devices │ │
|
|
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
|
|
│ │ │ │ │ │
|
|
│ ┌────▼────────────▼────────────▼────────────▼─────┐ │
|
|
│ │ Nomad App (Aggregator) │ │
|
|
│ └──────────────────────┬──────────────────────────┘ │
|
|
└─────────────────────────┼──────────────────────────────────┘
|
|
│
|
|
┌───────────────┼───────────────┐
|
|
│ │ │
|
|
┌─────▼─────┐ ┌────▼────┐ ┌──────▼──────┐
|
|
│ Agent 1 │ │ Agent 2 │ │ Agent N │
|
|
│ (Server) │ │ (NAS) │ │ (Remote) │
|
|
│ :9100 │ │ :9100 │ │ :9100 │
|
|
└───────────┘ └─────────┘ └─────────────┘
|
|
```
|
|
|
|
## Built-in Monitoring
|
|
|
|
### System Resource Monitoring
|
|
|
|
The Nomad application provides built-in system information through the system controller:
|
|
|
|
- **CPU**: Model, core count, usage
|
|
- **Memory**: Total, used, free, usage percentage
|
|
- **Disk**: Mount points, usage, filesystem types
|
|
- **Docker**: Container status, image versions, resource usage
|
|
|
|
### Docker Container Monitoring
|
|
|
|
Nomad monitors its own container stack and any Docker containers on the host via the Docker socket:
|
|
|
|
- Container health status
|
|
- Image versions and update availability
|
|
- Resource consumption
|
|
- Log access (via Dozzle integration in original install)
|
|
|
|
### Health Endpoints
|
|
|
|
| Endpoint | Service | Purpose |
|
|
|----------|---------|---------|
|
|
| `GET /api/health` | nomad-app | Application health |
|
|
| `GET /nginx-health` | nomad-nginx | Reverse proxy health |
|
|
| `GET /health` | nomad-agent | Agent health |
|
|
|
|
## Agent-Based Monitoring
|
|
|
|
### Architecture
|
|
|
|
The Nomad monitoring agent runs on remote homelab nodes and reports metrics via:
|
|
|
|
1. **Push model**: Agent sends JSON metrics to `POST /api/agent/report` on the Nomad server
|
|
2. **Pull model**: Prometheus scrapes the agent's `/metrics` endpoint
|
|
|
|
### Agent Metrics
|
|
|
|
```
|
|
nomad_agent_cpu_usage_percent - CPU utilization
|
|
nomad_agent_cpu_count - CPU core count
|
|
nomad_agent_memory_total_bytes - Total RAM
|
|
nomad_agent_memory_used_bytes - Used RAM
|
|
nomad_agent_memory_usage_percent - RAM utilization
|
|
nomad_agent_uptime_seconds - System uptime
|
|
nomad_agent_docker_containers - Docker container count
|
|
nomad_agent_docker_container_running - Per-container status
|
|
```
|
|
|
|
### Deployment
|
|
|
|
Deploy an agent on each node you want to monitor:
|
|
|
|
```bash
|
|
docker run -d --name nomad-agent \
|
|
--restart unless-stopped \
|
|
-p 9100:9100 \
|
|
-v /var/run/docker.sock:/var/run/docker.sock:ro \
|
|
-v /proc:/host/proc:ro \
|
|
-v /sys:/host/sys:ro \
|
|
-e NOMAD_SERVER_URL=http://nomad-server:8080 \
|
|
-e NODE_NAME=$(hostname) \
|
|
nomad-agent
|
|
```
|
|
|
|
## Prometheus Integration
|
|
|
|
### Full Monitoring Stack
|
|
|
|
For a complete monitoring setup, add Prometheus and Grafana to your docker-compose:
|
|
|
|
```yaml
|
|
# docker-compose.monitoring.yml
|
|
services:
|
|
prometheus:
|
|
image: prom/prometheus:latest
|
|
container_name: nomad-prometheus
|
|
restart: unless-stopped
|
|
ports:
|
|
- "9090:9090"
|
|
volumes:
|
|
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
|
|
- prometheus-data:/prometheus
|
|
networks:
|
|
- nomad-internal
|
|
|
|
grafana:
|
|
image: grafana/grafana:latest
|
|
container_name: nomad-grafana
|
|
restart: unless-stopped
|
|
ports:
|
|
- "3000:3000"
|
|
volumes:
|
|
- grafana-data:/var/lib/grafana
|
|
environment:
|
|
- GF_SECURITY_ADMIN_PASSWORD=admin
|
|
networks:
|
|
- nomad-internal
|
|
|
|
volumes:
|
|
prometheus-data:
|
|
grafana-data:
|
|
```
|
|
|
|
### Prometheus Configuration
|
|
|
|
```yaml
|
|
# monitoring/prometheus.yml
|
|
global:
|
|
scrape_interval: 30s
|
|
evaluation_interval: 30s
|
|
|
|
scrape_configs:
|
|
- job_name: 'nomad-agents'
|
|
static_configs:
|
|
- targets:
|
|
- 'nomad-agent:9100' # Local agent
|
|
- 'server2:9100' # Remote server
|
|
- 'nas:9100' # NAS agent
|
|
```
|
|
|
|
### Node Exporter (Optional)
|
|
|
|
For deeper host-level metrics, add the Prometheus Node Exporter alongside the Nomad agent:
|
|
|
|
```yaml
|
|
node-exporter:
|
|
image: prom/node-exporter:latest
|
|
container_name: nomad-node-exporter
|
|
restart: unless-stopped
|
|
ports:
|
|
- "9101:9100"
|
|
volumes:
|
|
- /proc:/host/proc:ro
|
|
- /sys:/host/sys:ro
|
|
- /:/rootfs:ro
|
|
command:
|
|
- '--path.procfs=/host/proc'
|
|
- '--path.sysfs=/host/sys'
|
|
- '--path.rootfs=/rootfs'
|
|
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
|
|
```
|
|
|
|
## Observability
|
|
|
|
### Structured Logging
|
|
|
|
The Nomad application uses structured JSON logging:
|
|
|
|
```json
|
|
{
|
|
"level": "info",
|
|
"timestamp": "2026-03-13T12:00:00.000Z",
|
|
"msg": "HTTP request completed",
|
|
"method": "GET",
|
|
"url": "/api/health",
|
|
"status": 200,
|
|
"duration": "12ms"
|
|
}
|
|
```
|
|
|
|
Configure log level via `LOG_LEVEL` environment variable: `debug`, `info`, `warn`, `error`.
|
|
|
|
### Log Access
|
|
|
|
```bash
|
|
# Application logs
|
|
docker compose logs -f nomad-app
|
|
|
|
# Worker logs
|
|
docker compose logs -f nomad-worker
|
|
|
|
# All service logs
|
|
docker compose logs -f
|
|
|
|
# Nginx access logs (on host)
|
|
tail -f ${NOMAD_DATA_DIR}/logs/nginx/access.log
|
|
```
|
|
|
|
### Alerting Recommendations
|
|
|
|
For homelab alerting, integrate with:
|
|
|
|
| Tool | Use Case |
|
|
|------|----------|
|
|
| **Uptime Kuma** | Service uptime monitoring |
|
|
| **Grafana Alerting** | Metric-based alerts |
|
|
| **Ntfy** | Push notifications |
|
|
| **Gotify** | Self-hosted notifications |
|
|
|
|
Example Uptime Kuma monitor:
|
|
- **URL**: `http://nomad-app:8080/api/health`
|
|
- **Interval**: 60 seconds
|
|
- **Expected status**: 200
|