Reproducing Experiments using Provenance Files
With workflow streamlined by yProv4ML, it is trivial to guarantee reproducibility of experiments even just sharing a single provenance file. To guarantee the necessary amount of information are present in the prov.json file however, some calls to the library have to be executed.
def log_execution_command(cmd : str): ...
Simply logs the execution command for it to be retrieved by the reproduction script later. This is often a call to python3
def log_source_code(path: Optional[str] = None): ...
Logs as an artifact a path to the source code. This could be a single python file (e.g. main.py), a repository link (if the path is not specified in the arguments), or an entire directory of source files. In case the source code is not on github, the source files are all copied inside the artifacts directory, and the path logged inside the provenance file will reference whis copy.
def log_input(inp : Any, log_copy_in_prov_directory : bool = True): ...
The directive log_input
saves with incremental ids a various number of inputs, which are user defined.
The log_copy_in_prov_directory
parameter indicates whether the input (if a file) is copied as artifact into the artifact directory. This is necessary for reproducibility of the experiment if the input is not retrievable in any other way.
def log_output(out : Any, log_copy_in_prov_directory : bool = True): ...
The directive log_output
saves with incremental ids a various number of outputs, which are user defined.