# n8n Workflow Comparison

Graph-based workflow similarity comparison using NetworkX and graph edit distance.

## Features

- **Graph Edit Distance**: Uses NetworkX's graph edit distance algorithm for accurate structural comparison
- **Configurable Cost Functions**: Customize costs for different types of edits (node/edge insertion, deletion, substitution)
- **Special Case Handling**: Higher penalties for trigger mismatches, similar node types grouped together
- **Parameter Comparison**: Deep comparison of node parameters with configurable ignore rules
- **External Configuration**: YAML/JSON config files for easy customization without code changes ([see CONFIGURATION.md](CONFIGURATION.md))
- **Built-in Presets**: Strict, standard, and lenient comparison modes
- **Detailed Output**: Returns similarity score and top edit operations needed

## Installation

This module uses `uv` for dependency management. No installation is needed - dependencies are automatically managed by `uvx`.

### Prerequisites

Install `uv`:

```bash
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
```

Install [just](https://github.com/casey/just)
```bash
# on macOS via homebrew
brew install just
# or gloabl install via NPM
npm install -g rust-just
# or cross platform via curl to DEST
curl --proto '=https' --tlsv1.2 -sSf https://just.systems/install.sh | bash -s -- --to DEST

```


## Usage

### CLI Usage

```bash
# Using default (standard) configuration
uvx --from . python -m src.compare_workflows generated.json ground_truth.json

# Using a preset
uvx --from . python -m src.compare_workflows generated.json ground_truth.json --preset strict

# Using custom configuration
uvx --from . python -m src.compare_workflows generated.json ground_truth.json --config my-config.yaml

# Output as human-readable summary
uvx --from . python -m src.compare_workflows generated.json ground_truth.json --output-format summary
```

### Python API Usage

```python
from config_loader import load_config
from graph_builder import build_workflow_graph
from similarity import calculate_graph_edit_distance
import json

# Load workflows
with open('generated.json') as f:
    generated = json.load(f)
with open('ground_truth.json') as f:
    ground_truth = json.load(f)

# Load configuration
config = load_config('preset:standard')

# Build graphs
g1 = build_workflow_graph(generated, config)
g2 = build_workflow_graph(ground_truth, config)

# Calculate similarity
result = calculate_graph_edit_distance(g1, g2, config)

print(f"Similarity: {result['similarity_score']:.2%}")
print(f"Edit cost: {result['edit_cost']:.1f}")
print(f"Top edits: {len(result['top_edits'])}")
```

## Configuration

> **📖 For detailed configuration documentation, see [CONFIGURATION.md](CONFIGURATION.md)**

### Built-in Presets

- **strict**: High penalties, exact matching required
- **standard**: Balanced comparison (default)
- **lenient**: Low penalties, focus on structure over details

### Quick Start

Create a YAML or JSON file with your custom rules:

```yaml
version: "1.0"
name: "my-custom-config"
description: "Custom configuration for my use case"

costs:
  nodes:
    insertion: 10.0
    deletion: 10.0
    substitution:
      same_type: 1.0
      similar_type: 5.0
      different_type: 15.0
      trigger_mismatch: 50.0

  edges:
    insertion: 5.0
    deletion: 5.0
    substitution: 3.0

similarity_groups:
  triggers:
    - "n8n-nodes-base.webhook"
    - "n8n-nodes-base.manualTrigger"

ignore:
  node_types:
    - "n8n-nodes-base.stickyNote"
  global_parameters:
    - "position"
    - "id"

parameter_comparison:
  numeric_tolerance:
    - parameter: "options.temperature"
      tolerance: 0.1
      cost_if_exceeded: 2.0
```

**For comprehensive documentation including:**
- Complete field reference
- Cost configuration strategies
- Advanced ignore rules and wildcards
- Parameter comparison rules
- Exemptions and conditional logic
- Real-world examples

See **[CONFIGURATION.md](CONFIGURATION.md)**

## Output Format

### JSON Output

```json
{
  "similarity_score": 0.78,
  "similarity_percentage": "78.0%",
  "edit_cost": 45.0,
  "max_possible_cost": 205.0,
  "top_edits": [
    {
      "type": "node_substitute",
      "description": "Replace 'Manual Trigger' with 'Webhook Trigger'",
      "cost": 25.0,
      "priority": "critical"
    }
  ],
  "metadata": {
    "generated_nodes": 5,
    "ground_truth_nodes": 6
  }
}
```

### Summary Output

```
============================================================
WORKFLOW COMPARISON SUMMARY
============================================================

Overall Similarity: 78.0%
Edit Cost:          45.0 / 205.0

Configuration: standard
  Standard balanced comparison configuration

Top 3 Required Edits:
------------------------------------------------------------
1. 🔴 [CRITICAL] Cost: 25.0
   Replace 'Manual Trigger' with 'Webhook Trigger'

2. 🟠 [MAJOR] Cost: 10.0
   Add missing 'HTTP Request' tool node

3. 🟡 [MINOR] Cost: 5.0
   Remove connection from 'Agent' to 'Extra Node'

============================================================
✅ PASS - Workflows are sufficiently similar
============================================================
```

## Testing

Run the test suite:

```bash
# Install dev dependencies
uv sync --dev

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov
```

## Algorithm Details

### Graph Representation
- Each workflow node becomes a graph node with attributes (type, parameters, etc.)
- Node and edge get a generated ID based on their position in the workflow
- Each workflow connection becomes a directed edge with connection type
- Nodes and edges are filtered based on configuration rules

### Graph Edit Distance
Uses NetworkX's `optimize_graph_edit_distance` with custom cost functions:
- Node operations: insertion, deletion, substitution
- Edge operations: insertion, deletion, substitution
- Cost functions consider node types, parameters, and configuration rules

### Similarity Score
```
similarity = 1 - (edit_cost / max_possible_cost)
```

Where `max_possible_cost` is the cost of deleting all nodes/edges from g1 and inserting all from g2.

## Troubleshooting

### Timeout errors
For very large or complex workflows, the comparison may timeout. Consider:
- Using a lenient preset to reduce computation
- Simplifying the workflow structure
- Increasing the timeout in the TypeScript wrapper

### Configuration errors
- Ensure YAML/JSON syntax is valid
- Check that node types and parameter paths are correct
- Use `--verbose` flag to see detailed configuration info