y2Graph
y2Graph (yaml to graph) is a simple Python tool to build W3C-PROV provenance graphs from workflow descriptions written in YAML. It uses the prov library to create entities, activities, and their relationships, and can export the results to PROV-JSON and a graph visualization (PNG).
This is useful when having to create large provenance graphs without needing to re-run the entire workflow.
Features
- Define workflows in a simple YAML file
- Each task specifies:
- inputs (UUIDs representing files or data items)
- outputs (UUIDs for generated results)
- Automatically constructs a PROV document linking tasks and data
- Export to:
- prov.json (standard W3C PROV format)
- PNG graph (requires Graphviz)
Example YAML Workflow
tasks:
- id: task1
label: "Load Data"
attributes:
- timestamp: 12345
- context: "training"
inputs: []
outputs:
- "uuid-1234"
- id: task2
label: "Process Data"
attributes:
- timestamp: 456677
inputs:
- "uuid-1234"
outputs:
- "uuid-5678"
- id: task3
label: "Analyze Results"
inputs:
- "uuid-5678"
outputs:
- "uuid-9999"
To create the corresponding graph:
python run.py test.yaml
📂 Output
- output_prov.json: PROV-JSON representation of the workflow
- output_graph.png: Graph visualization of tasks and data flow

Installation
Check out the yProv4ML documentation page to install graphviz.
Then:
git clone https://github.com/HPCI-Lab/y2Graph.git
cd y2Graph
pip install -r requirements.txt
pip install .
Usage
Currently, two usage modalities are available:
The former simply allows to convert from a yaml to W3C ProvJSON and the respective graph form:
python run.py example_join/test1.yaml example_join/test2.yaml --join -j combined.json -o combined.pdf
The latter allows to specify multiple yaml files, and to connect the jsons into a single file, as well as creating a common graph representation. This feature uses UIDs to identify shared elements which can be connected.
python run.py example_simple/test.yaml -j simple.json -o simple.pdf
Example
A set of 3 examples are provided:
Simple Example
This example allows to convert from a yaml to W3C ProvJSON and the respective graph form.
All data files are present in the example_simple subdirectory.
The file example.yaml is used as source, and the program generates fist a W3C Prov JSON file, and then converts it to pdf, svg or png visualization.
cd example_simple
python run.py example.yaml -j example.json -o example.pdf

Joined Example
This example allows to specify multiple yaml files, and to connect the jsons into a single file, as well as creating a common graph representation. This feature uses UIDs to identify shared elements which can be connected.
All data files are present in the example_join subdirectory.
The files example1.yaml and example2.yaml are used as source, and the program looks for ids the inputs and outputs fields to connect as common elements.
cd example_join
python run.py example1.yaml example2.yaml --join -j combined.json -o combined.pdf

Example with Images
This example allows to convert from a yaml to W3C ProvJSON and the respective graph form. In addition, when the path to an image is specified in an entity, the program is able to fetch it and visualize it in the graph.
All data files are present in the example_images subdirectory, with all images in the imgs folder.
The file example.yaml is used as source, and the program generates fist a W3C Prov JSON file, and then converts it to pdf, svg or png visualization.
cd example_images
python run.py example.yaml -j example_with_images.json -o example_with_images.pdf
