4 Syntax

A SWHID consists of two separate parts, a mandatory core identifier that can identify any software artifact (or "object"), and an optional list of qualifiers that allows specification of the context where the object is meant to be seen and points to a subpart of the object itself.

Syntactically, SWHIDs are generated by the <identifier> entry point in the following grammar:

<identifier> ::= <core_identifier> [ <qualifiers> ] ;

<core_identifier> ::= "swh" ":" <scheme_version> ":" <object_type> ":" <object_id> ;
<scheme_version> ::= "1" ;
<object_type> ::=
    "snp"  (* snapshot *)
  | "rel"  (* release *)
  | "rev"  (* revision *)
  | "dir"  (* directory *)
  | "cnt"  (* content *)
  ;
<object_id> ::= 40 * <hex_digit> ;  (* intrinsic object id, as hex-encoded SHA1 *)
<dec_digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
<hex_digit> ::= <dec_digit> | "a" | "b" | "c" | "d" | "e" | "f" ;

<qualifiers> ::= ";" <qualifier> [ <qualifiers> ] ;
<qualifier> ::=
    <context_qualifier>
  | <fragment_qualifier>
  ;
<context_qualifier> ::=
    <origin_ctxt>
  | <visit_ctxt>
  | <anchor_ctxt>
  | <path_ctxt>
  ;
<origin_ctxt> ::= "origin" "=" <url_escaped> ;
<visit_ctxt> ::= "visit" "=" <identifier_core> ;
<anchor_ctxt> ::= "anchor" "=" <identifier_core> ;
<path_ctxt> ::= "path" "=" <path_absolute_escaped> ;
<fragment_qualifier> ::= "lines" "=" <range> | "bytes" "=" <range> ;
<range> ::= <number> ["-" <number>] ;
<number> ::= <dec_digit> + ;
<url_escaped> ::= (* RFC 3987 IRI *)
<path_absolute_escaped> ::= (* RFC 3987 absolute path *)

The last two symbols are defined as:

  • <path_absolute_escaped> is an ipath-absolute from RFC-3987; and
  • <url_escaped> is an IRI as defined in RFC-3987.

In both of these, all occurrences of ; (and %, as required by the RFC) have been percent-encoded (as %3B and %25 respectively). Other characters may be percent-encoded, for example, to improve readability and/or embeddability of SWHID in other contexts.