6 Qualified identifiers

Qualifiers

A qualified, or full, SWHID is composed of a core SWHID identifier, and a sequence of qualifiers.

Qualifiers may be:

  • fragment qualifiers (see 6.1), that identify subparts of a software artifact; or
  • context qualifiers (see 6.2), that provide additional context on the software artifact.

Each qualifier is specified as a key-value pair, using an = character as a separator. Qualifiers are separated from the core identifier and from each other by using a ; character.

Some qualifiers are valid for specific object types, and the validity of some qualifiers depends on the presence of other qualifiers. Conformant implementation MUST not generate invalid qualifiers or qualifier combinations and MUST ignore them if present, as detailed in the following sections.

6.1 Fragment qualifiers

There are two fragment qualifiers, lines and bytes. Each fragment qualifier MUST appear at most once.

Fragment qualifiers are only valid for objects of type content.

Each valid SWHID must have at most one fragment qualifier.

A conformant implementation MAY accept a SWHID that violates this constraint, by ignoring the lines qualifier when the bytes qualifier is present.

6.1.1 Lines qualifier

A "line" in the context of a file content refers to a sequence of characters that ends with a line break. This line can contain text, code, or any other form of data. In this specification, the line break is the ASCII LF character. The "lines" qualifier allows to designate a line range inside a content. The range can be a single line number, or a pair of line numbers separated by the ASCII - character. Line numbers start from 1, and the range is inclusive, that is, the fragment includes both the lines numbered as the start and end of the range.

For example, swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;lines=9-15 designates the function generate_input_stream that is found at lines 9 to 15 of the content with core SWHID swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b.

Notice that the notion of "line number" is not always meaningful: the content may be a binary file, or a file that uses non standard line termination character(s).

6.1.2 Bytes qualifier

To overcome the limitations of the lines qualifier, the bytes qualifier allows designation of a byte range inside a content. The range can be a single byte number, or a pair of byte numbers separated by -. Byte numbers start from 0, and the range is inclusive, that is, the fragment includes both the bytes numbered as the start and end of the range. If the range is a single byte number, it designates the byte at that specific position.

For example, swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;bytes=154-315 designates the same function generate_input_stream as in the example above, but does not rely on any convention about line numbers.

6.2 Context qualifiers

There are four context qualifiers, origin, visit, path and anchor. Each context qualifier MUST appear at most once.

6.2.1 Origin qualifier

This qualifier allows declaration of the software origin where the object has been found or observed, as an URI.

For example, swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git indicates that the content seen previously with the function generate_input_stream has been seen in the Git repository at https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git

This qualifier may help to get hold of the full repository where a content has been found, but there is no guarantee of success, as an origin can change or disappear over time (as is the case in the example above, since gitorious.org was shut down in 2015).

6.2.2 Visit qualifier

This qualifier allows addition of the core SWHID identifier of the snapshot of the repository where the object has been found or observed.

For example, swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9 indicates that the content seen previously with the function generate_input_stream has been seen in the Git repository at https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git, when its full state had the SWHID core identifier swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9.

This qualifier is only valid when the origin qualifier is also present. Otherwise, it MUST be ignored.

6.2.3 Path qualifier

This qualifier allows declaration of the absolute file path, from the root directory associated to the anchor node, to the object designated by the core SWHID identifier; when the anchor denotes a directory, a revision or a release, the root directory is uniquely determined; when the anchor denotes a snapshot, the root directory is the first directory reachable from the HEAD branch, and undefined if such a reference is missing.

For example, swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml indicates that the content seen previously with the function generate_input_stream has been seen in the Git repository at https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git, when its full state had the SWHID core identifier swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9, and that it is named simplefarm.ml in the directory Simplefarm contained in the directory Examples contained in the root directory associated to the revision with core SWHID swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0.

This qualifier is only valid when the object type is not content. Otherwise, it MUST be ignored.

6.2.4 Anchor qualifier

This qualifier is used in conjunction with the path qualifier. It allows identification of a node in the Merkle DAG relative to which a path to the object is specified, as the core identifier of a directory, a revision, a release or a snapshot. See the example provided for the path qualifier.

This qualifier is only valid when the path qualifier is also present. Otherwise, it MUST be ignored.

6.3 Comparing qualified SWHIDs

One can determine whether two software artifacts are identical (bit by bit) by comparing their core SWHIDs, ignoring all qualifiers. If the core SWHIDs are equal, the software artifacts they represent are identical.

To determine if two SWHIDs represent the same software artifact (or fragment thereof) in the same context, one must also compare their qualifiers. Two SWHIDs are considered equivalent in context if:

  • They both have the same set of qualifiers.
  • The values of these qualifiers are identical. For instance, if both SWHIDs have an anchor qualifier, the core SWHID values of these qualifiers are identical. Similarly, if both have a lines qualifier, their values are identical.

Note that the order of the qualifiers does not matter for comparison purposes.

6.4 Recommendations

We recommend equipping identifiers meant for sharing with as many qualifiers as possible. While qualifiers may be listed in any order, it is good practice to present them in the following canonical order: origin, visit, anchor, path, lines or bytes.

By adhering to this order, it becomes easier to visually inspect and compare SWHIDs, especially when dealing with a large number of identifiers.

Here is an example: swh:1:cnt:4d99d2d18326621ccdd70f5ea66c2e2ac236ad8b;origin=https://gitorious.org/ocamlp3l/ocamlp3l_cvs.git;visit=swh:1:snp:d7f1b9eb7ccb596c2622c4780febaa02549830f9;anchor=swh:1:rev:2db189928c94d62a3b4757b3eec68f0a4d4113f0;path=/Examples/SimpleFarm/simplefarm.ml;lines=9-15

Redundant information should be omitted: for example, if the visit is present, and the path is relative to the snapshot indicated there, then the anchor qualifier is superfluous; similarly, if the path is empty, it may be omitted.