Introduction

Modern software relies heavily on open source components that are developed collaboratively in a distributed setting, and that are assembled to create complex systems that evolve at a fast pace.

This has strengthened the need to precisely track, ensure availability, and guarantee integrity of the components that go into a given system for a variety of stakeholders. Academia needs to ensure that research results are reproducible, industry needs to improve the traceability of the software supply chain, developer communities need tools to cope with the increasing complexity.

A key building block for addressing this issue is a system of intrinsic identifiers that allows users to precisely pinpoint the exact version of any software artifact, at all levels of granularity, without relying on any central registry or naming authority.

With this specification, the SWHID working group makes such a system of intrinsic identifiers, originally developed for the Software Heritage universal source code archive, available to all stakeholders.

For the sake of clarity, we will use examples drawn directly from the Software Heritage archive, but notice that systems for the persistent archival of software artifacts, as well as resolution of SWHIDs are out of the scope of this specification, and the SWHID specification does not require in any way the use of Software Heritage.