4 Syntax

A SWHID consists of two separate parts, a mandatory core identifier that can identify any software artifact (or "object"), and an optional list of qualifiers that allows to specify the context where the object is meant to be seen and point to a subpart of the object itself.

Syntactically, SWHIDs are generated by the <identifier> entry point in the following grammar:

<identifier> ::= <core_identifier> [ <qualifiers> ] ;

<core_identifier> ::= "swh" ":" <scheme_version> ":" <object_type> ":" <object_id> ;
<scheme_version> ::= "1" ;
<object_type> ::=
    "snp"  (* snapshot *)
  | "rel"  (* release *)
  | "rev"  (* revision *)
  | "dir"  (* directory *)
  | "cnt"  (* content *)
  ;
<object_id> ::= 40 * <hex_digit> ;  (* intrinsic object id, as hex-encoded SHA1 *)
<dec_digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
<hex_digit> ::= <dec_digit> | "a" | "b" | "c" | "d" | "e" | "f" ;

<qualifiers> := ";" <qualifier> [ <qualifiers> ] ;
<qualifier> ::=
    <context_qualifier>
  | <fragment_qualifier>
  ;
<context_qualifier> ::=
    <origin_ctxt>
  | <visit_ctxt>
  | <anchor_ctxt>
  | <path_ctxt>
  ;
<origin_ctxt> ::= "origin" "=" <url_escaped> ;
<visit_ctxt> ::= "visit" "=" <identifier_core> ;
<anchor_ctxt> ::= "anchor" "=" <identifier_core> ;
<path_ctxt> ::= "path" "=" <path_absolute_escaped> ;
<fragment_qualifier> ::= "lines" "=" <line_number> ["-" <line_number>] ;
<line_number> ::= <dec_digit> + ;
<url_escaped> ::= (* RFC 3987 IRI *)
<path_absolute_escaped> ::= (* RFC 3987 absolute path *)

The last two symbols are defined as:

  • <path_absolute_escaped> is an ipath-absolute from RFC-3987; and
  • <url_escaped> is an IRI as defined in RFC-3987.

In both of these, all occurrences of ; (and %, as required by the RFC) have been percent-encoded (as %3B and %25 respectively). Other characters may be percent-encoded, e.g., to improve readability and/or embeddability of SWHID in other contexts.