Querying from Provenance Files
yProv4ml offers a set of directives to easily extract the information logged from the provenance.json file.
⚠
All these functions expect the data to be passed to be a dictionary (json file opened in python). When using a provenance json file coming from yProv4ML, this can be easily obtained following the example below.
Example:
import json
data = json.load(open(path_to_prov_json))
Utility Functions
def get_metrics(data : dict, keyword : Optional[str] = None) -> List[str]
The get_metrics function retrieves all available metrics from the provided provjson file. If a keyword is specified, it filters the results to include only metrics that match the keyword.
| Parameter | Type | Default | Description |
|---|---|---|---|
data | pd.DataFrame | Required | The dataset containing metrics. |
keyword | Optional[str] | None | If provided, filters the metrics to only those containing this keyword. |
def get_param(data : dict, param : str) -> Any
Retrieves a single value corresponding to the given param. This function is useful when the parameter is expected to have a unique value and the label exactly matches in the prov json file.
def get_params(data : dict, param : str) -> List[Any]
Retrieves a list of values for the given param. This is useful when multiple values exist for the parameter (for example when marked with an incremental ID) in the provenance json file, allowing further analysis or aggregation.
| Parameter | Type | Return Type | Description |
|---|---|---|---|
data | pd.DataFrame | - | The dataset containing parameters. |
param | str | - | The specific parameter to retrieve. |
⚠
Viewing metrics data depends on the way it is saved in the experiment.
- If CSV format is used, we suggest opening it with [pandas](https://pandas.pydata.org/)
- If ZARR or NETCDF are used, then either [xarray](https://docs.xarray.dev/en/stable/index.html) or an ad-hoc library ([zarr-python](https://zarr.readthedocs.io/en/stable/) and [netcdf4](https://pypi.org/project/netCDF4/)) can be used.