y2Graph

GPLv3 License

y2Graph (yaml to graph) is a simple Python tool to build W3C-PROV provenance graphs from workflow descriptions written in YAML. It uses the prov library to create entities, activities, and their relationships, and can export the results to PROV-JSON and a graph visualization (PNG).

This is useful when having to create large provenance graphs without needing to re-run the entire workflow.

Features

  • Define workflows in a simple YAML file
  • Each task specifies:
    • inputs (UUIDs representing files or data items)
    • outputs (UUIDs for generated results)
  • Automatically constructs a PROV document linking tasks and data
  • Export to:
    • prov.json (standard W3C PROV format)
    • PNG graph (requires Graphviz)

Example YAML Workflow

tasks:
  - id: task1
    label: "Load Data"
    attributes: 
      - timestamp: 12345
      - context: "training"
    inputs: []
    outputs:
      - "uuid-1234"

  - id: task2
    label: "Process Data"
    attributes: 
      - timestamp: 456677
    inputs:
      - "uuid-1234"
    outputs:
      - "uuid-5678"

  - id: task3
    label: "Analyze Results"
    inputs:
      - "uuid-5678"
    outputs:
      - "uuid-9999"

To create the corresponding graph:

python run.py test.yaml

📂 Output

  • output_prov.json: PROV-JSON representation of the workflow
  • output_graph.png: Graph visualization of tasks and data flow

output_graph

Installation

Check out the yProv4ML documentation page to install graphviz.

Then:

git clone https://github.com/HPCI-Lab/y2Graph.git
cd y2Graph
pip install -r requirements.txt

pip install .

Usage

Currently, two usage modalities are available:

The former simply allows to convert from a yaml to W3C ProvJSON and the respective graph form:

python run.py example_join/test1.yaml example_join/test2.yaml --join -j combined.json -o combined.pdf

The latter allows to specify multiple yaml files, and to connect the jsons into a single file, as well as creating a common graph representation. This feature uses UIDs to identify shared elements which can be connected.

python run.py example_simple/test.yaml -j simple.json -o simple.pdf

Example

A set of 3 examples are provided:

Simple Example

This example allows to convert from a yaml to W3C ProvJSON and the respective graph form.

All data files are present in the example_simple subdirectory.

The file example.yaml is used as source, and the program generates fist a W3C Prov JSON file, and then converts it to pdf, svg or png visualization.

cd example_simple

python run.py example.yaml -j example.json -o example.pdf

ExampleSimple

Joined Example

This example allows to specify multiple yaml files, and to connect the jsons into a single file, as well as creating a common graph representation. This feature uses UIDs to identify shared elements which can be connected.

All data files are present in the example_join subdirectory.

The files example1.yaml and example2.yaml are used as source, and the program looks for ids the inputs and outputs fields to connect as common elements.

cd example_join

python run.py example1.yaml example2.yaml --join -j combined.json -o combined.pdf

ExampleJoined

Example with Images

This example allows to convert from a yaml to W3C ProvJSON and the respective graph form. In addition, when the path to an image is specified in an entity, the program is able to fetch it and visualize it in the graph.

All data files are present in the example_images subdirectory, with all images in the imgs folder.

The file example.yaml is used as source, and the program generates fist a W3C Prov JSON file, and then converts it to pdf, svg or png visualization.

cd example_images

python run.py example.yaml -j example_with_images.json -o example_with_images.pdf

ExampleWithImages