Skip to content

Latest commit

 

History

History
72 lines (61 loc) · 2.48 KB

identifiers.md

File metadata and controls

72 lines (61 loc) · 2.48 KB

Víðarr Identifiers

Rather than using incrementing identifiers, Víðarr identifies workflow versions, workflow runs, and output files using SHA-256 hashes of key metadata. The assumption is that if two objects have the same hash, they must be equivalent. All workflow run matching is done by hash matching.

For all hashes, the hash is computed using a SHA-256 of the data described below. All strings are converted to UTF-8 encoded bytes. Strings are not permitted to contain the NUL (zero) byte. Some hashes contain the IDs of other hashes. The hashes are encoded as ASCII strings in lowercase hexadecimal representation. All JSON objects have keys in alphabetical order.

Workflow Versions

A workflow version hash is present for each version of a workflow installed. Even if the same WDL file is installed under two different names, there will be two different workflow version hashes. It is computed as follows:

  • name
  • NUL
  • version
  • NUL
  • HEX_DIGITS(SHA256( WDL file UTF-8 bytes ))
  • JSON(output parameters)
  • JSON(input parameters)
  • for filename, contents in accessory files; sorted by filename:
    • NUL
    • filename
    • NUL
    • HEX_DIGITS(SHA256(contents))

Workflow Runs

Each workflow run has a hash consisting of data that is considered to uniquely identify it but this does not include all information in a workflow run. That is, there are intentional hash collisions for different workflow runs.

  • workflow-name
  • for input in input-ids; sorted and unique
    • NUL
    • hash from input =~ vidarr:server/hash
  • for provider, identifier in external-keys; sorted by provider, then identifier:
    • NUL
    • NUL
    • provider
    • NUL
    • identifier
    • NUL
  • for name, value in labels; sorted by key:
    • NUL
    • name
    • NUL
    • JSON(value)
    • NUL

Output Analysis: Files

The files provisioned out are given the ID:

  • workflow-run-identifier
  • BASENAME(final output path)

Note that if the provisioning output workflow renames files, that is now the hash.

Output Analysis: URLs

The URLs provisioned out are given the ID:

  • workflow-run-identifier
  • URL

Nulls in Hashes

The NUL characters are a kind of insurance against malicious names. Say the hash was just did name followed by version, then foobar + 1.0.0 becomes indistinguishable from foo + bar1.0.0. Although no one would construct such a name, but the nulls make it an easy way to prevent anyone from trying.