binary_distribution: content addressable tarballs (#48713)

binary_distribution: content addressable url buildcache

Change how binary mirrors are laid out, adopting content addressing for every
piece of data spack stores in a binary mirror. Items (e.g. tarballs, specfiles, public
keys, indices, etc) are now discoverable via manifest files which give the size,
checksum, compression type, etc of the the stored item. The information in the
manifest, in turn, is used to find the actual data, which is stored by its content
address in the blobs directory. Additionally, signing is now applied to the manifest
files, rather than to the spec files themselves.
This commit is contained in:
Scott Wittenburg
2025-05-06 12:32:15 -06:00
committed by GitHub
parent 6587b2a231
commit 2c05ce3607
61 changed files with 5940 additions and 1333 deletions

View File

@@ -45,10 +45,14 @@ provided binary cache, which can be a local directory or a remote URL.
Here is an example where a build cache is created in a local directory named
"spack-cache", to which we push the "ninja" spec:
ninja-1.12.1-vmvycib6vmiofkdqgrblo7zsvp7odwut
.. code-block:: console
$ spack buildcache push ./spack-cache ninja
==> Pushing binary packages to file:///home/spackuser/spack/spack-cache/build_cache
==> Selected 30 specs to push to file:///home/spackuser/spack/spack-cache
...
==> [30/30] Pushed ninja@1.12.1/ngldn2k
Note that ``ninja`` must be installed locally for this to work.
@@ -98,9 +102,10 @@ Now you can use list:
.. code-block:: console
$ spack buildcache list
==> 1 cached build.
-- linux-ubuntu20.04-skylake / gcc@9.3.0 ------------------------
ninja@1.10.2
==> 24 cached builds.
-- linux-ubuntu22.04-sapphirerapids / gcc@12.3.0 ----------------
[ ... ]
ninja@1.12.1
With ``mymirror`` configured and an index available, Spack will automatically
use it during concretization and installation. That means that you can expect
@@ -111,17 +116,17 @@ verify by re-installing ninja:
$ spack uninstall ninja
$ spack install ninja
==> Installing ninja-1.11.1-yxferyhmrjkosgta5ei6b4lqf6bxbscz
==> Fetching file:///home/spackuser/spack/spack-cache/build_cache/linux-ubuntu20.04-skylake-gcc-9.3.0-ninja-1.10.2-yxferyhmrjkosgta5ei6b4lqf6bxbscz.spec.json.sig
gpg: Signature made Do 12 Jan 2023 16:01:04 CET
gpg: using RSA key 61B82B2B2350E171BD17A1744E3A689061D57BF6
[ ... ]
==> Installing ninja-1.12.1-ngldn2kpvb6lqc44oqhhow7fzg7xu7lh [24/24]
gpg: Signature made Thu 06 Mar 2025 10:03:38 AM MST
gpg: using RSA key 75BC0528114909C076E2607418010FFAD73C9B07
gpg: Good signature from "example (GPG created for Spack) <example@example.com>" [ultimate]
==> Fetching file:///home/spackuser/spack/spack-cache/build_cache/linux-ubuntu20.04-skylake/gcc-9.3.0/ninja-1.10.2/linux-ubuntu20.04-skylake-gcc-9.3.0-ninja-1.10.2-yxferyhmrjkosgta5ei6b4lqf6bxbscz.spack
==> Extracting ninja-1.10.2-yxferyhmrjkosgta5ei6b4lqf6bxbscz from binary cache
==> ninja: Successfully installed ninja-1.11.1-yxferyhmrjkosgta5ei6b4lqf6bxbscz
Search: 0.00s. Fetch: 0.17s. Install: 0.12s. Total: 0.29s
[+] /home/harmen/spack/opt/spack/linux-ubuntu20.04-skylake/gcc-9.3.0/ninja-1.11.1-yxferyhmrjkosgta5ei6b4lqf6bxbscz
==> Fetching file:///home/spackuser/spack/spack-cache/blobs/sha256/f0/f08eb62661ad159d2d258890127fc6053f5302a2f490c1c7f7bd677721010ee0
==> Fetching file:///home/spackuser/spack/spack-cache/blobs/sha256/c7/c79ac6e40dfdd01ac499b020e52e57aa91151febaea3ad183f90c0f78b64a31a
==> Extracting ninja-1.12.1-ngldn2kpvb6lqc44oqhhow7fzg7xu7lh from binary cache
==> ninja: Successfully installed ninja-1.12.1-ngldn2kpvb6lqc44oqhhow7fzg7xu7lh
Search: 0.00s. Fetch: 0.11s. Install: 0.11s. Extract: 0.10s. Relocate: 0.00s. Total: 0.22s
[+] /home/spackuser/spack/opt/spack/linux-ubuntu22.04-sapphirerapids/gcc-12.3.0/ninja-1.12.1-ngldn2kpvb6lqc44oqhhow7fzg7xu7lh
It worked! You've just completed a full example of creating a build cache with
a spec of interest, adding it as a mirror, updating its index, listing the contents,
@@ -344,19 +349,18 @@ which lets you get started quickly. See the following resources for more informa
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Create tarball of installed Spack package and all dependencies.
Tarballs are checksummed and signed if gpg2 is available.
Places them in a directory ``build_cache`` that can be copied to a mirror.
Commands like ``spack buildcache install`` will search Spack mirrors for build_cache to get the list of build caches.
Tarballs and specfiles are compressed and checksummed, manifests are signed if gpg2 is available.
Commands like ``spack buildcache install`` will search Spack mirrors to get the list of build caches.
============== ========================================================================================================================
Arguments Description
============== ========================================================================================================================
``<specs>`` list of partial specs or hashes with a leading ``/`` to match from installed packages and used for creating build caches
``-d <path>`` directory in which ``build_cache`` directory is created, defaults to ``.``
``-f`` overwrite ``.spack`` file in ``build_cache`` directory if it exists
``-d <path>`` directory in which ``v3`` and ``blobs`` directories are created, defaults to ``.``
``-f`` overwrite compressed tarball and spec metadata files if they already exist
``-k <key>`` the key to sign package with. In the case where multiple keys exist, the package will be unsigned unless ``-k`` is used.
``-r`` make paths in binaries relative before creating tarball
``-y`` answer yes to all create unsigned ``build_cache`` questions
``-y`` answer yes to all questions about creating unsigned build caches
============== ========================================================================================================================
^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -397,6 +401,165 @@ List public keys available on Spack mirror.
========= ==============================================
Arguments Description
========= ==============================================
``-i`` trust the keys downloaded with prompt for each
``-it`` trust the keys downloaded with prompt for each
``-y`` answer yes to all trust all keys downloaded
========= ==============================================
.. _build_cache_layout:
------------------
Build Cache Layout
------------------
This section describes the structure and content of URL-style build caches, as
distinguished from OCI-style build caches.
The entry point for a binary package is a manifest json file that points to at
least two other files stored as content-addressed blobs. These files include a spec
metadata file, as well as the installation directory of the package stored as
a compressed archive file. Binary package manifest files are named to indicate
the package name and version, as well as the hash of the concrete spec. For
example::
gcc-runtime-12.3.0-qyu2lvgt3nxh7izxycugdbgf5gsdpkjt.spec.manifest.json
would contain the manifest for a binary package of ``gcc-runtime@12.3.0``.
The id of the built package is defined to be the DAG hash of the concrete spec,
and exists in the name of the file as well. The id distinguishes a particular
binary package from all other binary packages with the same package name and
version. Below is an example binary package manifest file. Such a file would
live in the versioned spec manifests directory of a binary mirror, for example
``v3/manifests/spec/``::
{
"version": 3,
"data": [
{
"contentLength": 10731083,
"mediaType": "application/vnd.spack.install.v2.tar+gzip",
"compression": "gzip",
"checksumAlgorithm": "sha256",
"checksum": "0f24aa6b5dd7150067349865217acd3f6a383083f9eca111d2d2fed726c88210"
},
{
"contentLength": 1000,
"mediaType": "application/vnd.spack.spec.v5+json",
"compression": "gzip",
"checksumAlgorithm": "sha256",
"checksum": "fba751c4796536737c9acbb718dad7429be1fa485f5585d450ab8b25d12ae041"
}
]
}
The manifest points to both the compressed tar file as well as the compressed
spec metadata file, and contains the checksum of each. This checksum
is also used as the address of the associated file, and hence, must be
known in order to locate the tarball or spec file within the mirror. Once the
tarball or spec metadata file is downloaded, the checksum should be computed locally
and compared to the checksum in the manifest to ensure the contents have not changed
since the binary package was pushed. Spack stores all data files (including compressed
tar files, spec metadata, indices, public keys, etc) within a ``blobs/<hash-algorithm>/``
directory, using the first two characters of the checksum as a sub-directory
to reduce the number files in a single folder. Here is a depiction of the
organization of binary mirror contents::
mirror_directory/
v3/
layout.json
manifests/
spec/
gcc-runtime/
gcc-runtime-12.3.0-s2nqujezsce4x6uhtvxscu7jhewqzztx.spec.manifest.json
gmake/
gmake-4.4.1-lpr4j77rcgkg5536tmiuzwzlcjsiomph.spec.manifest.json
compiler-wrapper/
compiler-wrapper-1.0-s7ieuyievp57vwhthczhaq2ogowf3ohe.spec.manifest.json
index/
index.manifest.json
key/
75BC0528114909C076E2607418010FFAD73C9B07.key.manifest.json
keys.manifest.json
blobs/
sha256/
0f/
0f24aa6b5dd7150067349865217acd3f6a383083f9eca111d2d2fed726c88210
fb/
fba751c4796536737c9acbb718dad7429be1fa485f5585d450ab8b25d12ae041
2a/
2a21836d206ccf0df780ab0be63fdf76d24501375306a35daa6683c409b7922f
...
Files within the ``manifests`` directory are organized into subdirectories by
the type of entity they represent. Binary package manifests live in the ``spec/``
directory, binary cache index manifests live in the ``index/`` directory, and
manifests for public keys and their indices live in the ``key/`` subdirectory.
Regardless of the type of entity they represent, all manifest files are named
with an extension ``.manifest.json``.
Every manifest contains a ``data`` array, each element of which refers to an
associated file stored a content-addressed blob. Considering the example spec
manifest shown above, the compressed installation archive can be found by
picking out the data blob with the appropriate ``mediaType``, which in this
case would be ``application/vnd.spack.install.v1.tar+gzip``. The associated
file is found by looking in the blobs directory under ``blobs/sha256/fb/`` for
the file named with the complete checksum value.
As mentioned above, every entity in a binary mirror (aka build cache) is stored
as a content-addressed blob pointed to by a manifest. While an example spec
manifest (i.e. a manifest for a binary package) is shown above, here is what
the manifest of a build cache index looks like::
{
"version": 3,
"data": [
{
"contentLength": 6411,
"mediaType": "application/vnd.spack.db.v8+json",
"compression": "none",
"checksumAlgorithm": "sha256",
"checksum": "225a3e9da24d201fdf9d8247d66217f5b3f4d0fc160db1498afd998bfd115234"
}
]
}
Some things to note about this manifest are that it points to a blob that is not
compressed (``compression: "none"``), and that the ``mediaType`` is one we have
not seen yet, ``application/vnd.spack.db.v8+json``. The decision not to compress
build cache indices stems from the fact that spack does not yet sign build cache
index manifests. Once that changes, you may start to see these indices stored as
compressed blobs.
For completeness, here are examples of manifests for the other two types of entities
you might find in a spack build cache. First a public key manifest::
{
"version": 3,
"data": [
{
"contentLength": 2472,
"mediaType": "application/pgp-keys",
"compression": "none",
"checksumAlgorithm": "sha256",
"checksum": "9fc18374aebc84deb2f27898da77d4d4410e5fb44c60c6238cb57fb36147e5c7"
}
]
}
Note the ``mediaType`` of ``application/pgp-keys``. Finally, a public key index manifest::
{
"version": 3,
"data": [
{
"contentLength": 56,
"mediaType": "application/vnd.spack.keyindex.v1+json",
"compression": "none",
"checksumAlgorithm": "sha256",
"checksum": "29b3a0eb6064fd588543bc43ac7d42d708a69058dafe4be0859e3200091a9a1c"
}
]
}
Again note the ``mediaType`` of ``application/vnd.spack.keyindex.v1+json``. Also note
that both the above manifest examples refer to uncompressed blobs, this is for the same
reason spack does not yet compress build cache index blobs.

View File

@@ -176,92 +176,72 @@ community without needing deep familiarity with GnuPG or Public Key
Infrastructure.
.. _build_cache_format:
.. _build_cache_signing:
------------------
Build Cache Format
------------------
-------------------
Build Cache Signing
-------------------
A binary package consists of a metadata file unambiguously defining the
built package (and including other details such as how to relocate it)
and the installation directory of the package stored as a compressed
archive file. The metadata files can either be unsigned, in which case
the contents are simply the json-serialized concrete spec plus metadata,
or they can be signed, in which case the json-serialized concrete spec
plus metadata is wrapped in a gpg cleartext signature. Built package
metadata files are named to indicate the operating system and
architecture for which the package was built as well as the compiler
used to build it and the packages name and version. For example::
For an in-depth description of the layout of a binary mirror, see
the :ref:`documentation<build_cache_layout>` covering binary caches. The
key takeaway from that discussion that applies here is that the entry point
to a binary package is it's manifest. The manifest refers unambiguously to the
spec metadata and compressed archive, which are stored as content-addressed
blobs.
linux-ubuntu18.04-haswell-gcc-7.5.0-zlib-1.2.12-llv2ysfdxnppzjrt5ldybb5c52qbmoow.spec.json.sig
would contain the concrete spec and binary metadata for a binary package
of ``zlib@1.2.12``, built for the ``ubuntu`` operating system and ``haswell``
architecture. The id of the built package exists in the name of the file
as well (after the package name and version) and in this case begins
with ``llv2ys``. The id distinguishes a particular built package from all
other built packages with the same os/arch, compiler, name, and version.
Below is an example of a signed binary package metadata file. Such a
file would live in the ``build_cache`` directory of a binary mirror::
The manifest files can either be signed or unsigned, but are always given
a name ending with ``.spec.manifest.json`` regardless. The difference between
signed and unsigned manifests is simply that the signed version is wrapped in
a gpg cleartext signature, as illustrated below::
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
{
"spec": {
<concrete-spec-contents-omitted>
},
"buildcache_layout_version": 1,
"binary_cache_checksum": {
"hash_algorithm": "sha256",
"hash": "4f1e46452c35a5e61bcacca205bae1bfcd60a83a399af201a29c95b7cc3e1423"
}
"version": 3,
"data": [
{
"contentLength": 10731083,
"mediaType": "application/vnd.spack.install.v2.tar+gzip",
"compression": "gzip",
"checksumAlgorithm": "sha256",
"checksum": "0f24aa6b5dd7150067349865217acd3f6a383083f9eca111d2d2fed726c88210"
},
{
"contentLength": 1000,
"mediaType": "application/vnd.spack.spec.v5+json",
"compression": "gzip",
"checksumAlgorithm": "sha256",
"checksum": "fba751c4796536737c9acbb718dad7429be1fa485f5585d450ab8b25d12ae041"
}
]
}
-----BEGIN PGP SIGNATURE-----
iQGzBAEBCgAdFiEETZn0sLle8jIrdAPLx/P+voVcifMFAmKAGvwACgkQx/P+voVc
ifNoVgv/VrhA+wurVs5GB9PhmMA1m5U/AfXZb4BElDRwpT8ZcTPIv5X8xtv60eyn
4EOneGVbZoMThVxgev/NKARorGmhFXRqhWf+jknJZ1dicpqn/qpv34rELKUpgXU+
QDQ4d1P64AIdTczXe2GI9ZvhOo6+bPvK7LIsTkBbtWmopkomVxF0LcMuxAVIbA6b
887yBvVO0VGlqRnkDW7nXx49r3AG2+wDcoU1f8ep8QtjOcMNaPTPJ0UnjD0VQGW6
4ZFaGZWzdo45MY6tF3o5mqM7zJkVobpoW3iUz6J5tjz7H/nMlGgMkUwY9Kxp2PVH
qoj6Zip3LWplnl2OZyAY+vflPFdFh12Xpk4FG7Sxm/ux0r+l8tCAPvtw+G38a5P7
QEk2JBr8qMGKASmnRlJUkm1vwz0a95IF3S9YDfTAA2vz6HH3PtsNLFhtorfx8eBi
Wn5aPJAGEPOawEOvXGGbsH4cDEKPeN0n6cy1k92uPEmBLDVsdnur8q42jk5c2Qyx
j3DXty57
=3gvm
iQGzBAEBCgAdFiEEdbwFKBFJCcB24mB0GAEP+tc8mwcFAmf2rr4ACgkQGAEP+tc8
mwfefwv+KJs8MsQ5ovFaBdmyx5H/3k4rO4QHBzuSPOB6UaxErA9IyOB31iP6vNTU
HzYpxz6F5dJCJWmmNEMN/0+vjhMHEOkqd7M1l5reVcxduTF2yc4tBZUO2gienEHL
W0e+SnUznl1yc/aVpChUiahO2zToCsI8HZRNT4tu6iCnE/OpghqjsSdBOZHmSNDD
5wuuCxfDUyWI6ZlLclaaB7RdbCUUJf/iqi711J+wubvnDFhc6Ynwm1xai5laJ1bD
ev3NrSb2AAroeNFVo4iECA0fZC1OZQYzaRmAEhBXtCideGJ5Zf2Cp9hmCwNK8Hq6
bNt94JP9LqC3FCCJJOMsPyOOhMSA5MU44zyyzloRwEQpHHLuFzVdbTHA3dmTc18n
HxNLkZoEMYRc8zNr40g0yb2lCbc+P11TtL1E+5NlE34MX15mPewRCiIFTMwhCnE3
gFSKtW1MKustZE35/RUwd2mpJRf+mSRVCl1f1RiFjktLjz7vWQq7imIUSam0fPDr
XD4aDogm
=RrFX
-----END PGP SIGNATURE-----
If a user has trusted the public key associated with the private key
used to sign the above spec file, the signature can be verified with
used to sign the above manifest file, the signature can be verified with
gpg, as follows::
$ gpg verify linux-ubuntu18.04-haswell-gcc-7.5.0-zlib-1.2.12-llv2ysfdxnppzjrt5ldybb5c52qbmoow.spec.json.sig
$ gpg --verify gcc-runtime-12.3.0-s2nqujezsce4x6uhtvxscu7jhewqzztx.spec.manifest.json
The metadata (regardless whether signed or unsigned) contains the checksum
of the ``.spack`` file containing the actual installation. The checksum should
be compared to a checksum computed locally on the ``.spack`` file to ensure the
contents have not changed since the binary spec plus metadata were signed. The
``.spack`` files are actually tarballs containing the compressed archive of the
install tree. These files, along with the metadata files, live within the
``build_cache`` directory of the mirror, and together are organized as follows::
build_cache/
# unsigned metadata (for indexing, contains sha256 of .spack file)
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.spec.json
# clearsigned metadata (same as above, but signed)
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.spec.json.sig
<arch>/
<compiler>/
<name>-<ver>/
# tar.gz-compressed prefix (may support more compression formats later)
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.spack
Uncompressing and extracting the ``.spack`` file results in the install tree.
This is in contrast to previous versions of spack, where the ``.spack`` file
contained a (duplicated) metadata file, a signature file and a nested tarball
containing the install tree.
When attempting to install a binary package that has been signed, spack will
attempt to verify the signature with one of the trusted keys in its keyring,
and will fail if unable to do so. While not recommended, it is possible to
force installation of a signed package without verification by providing the
``--no-check-signature`` argument to ``spack install ...``.
.. _internal_implementation:
@@ -320,10 +300,10 @@ the following way:
Reputational Public Key are imported into a keyring by the ``spack gpg …``
sub-command. This is initiated by the jobs build script which is created by
the generate job at the beginning of the pipeline.
4. Assuming the package has dependencies those specs are verified using
4. Assuming the package has dependencies those spec manifests are verified using
the keyring.
5. The package is built and the spec.json is generated
6. The spec.json is signed by the keyring and uploaded to the mirrors
5. The package is built and the spec manifest is generated
6. The spec manifest is signed by the keyring and uploaded to the mirrors
build cache.
**Reputational Key**
@@ -376,24 +356,24 @@ following way:
4. In addition to the secret, the runner creates a tmpfs memory mounted
directory where the GnuPG keyring will be created to verify, and
then resign the package specs.
5. The job script syncs all spec.json.sig files from the build cache to
5. The job script syncs all spec manifest files from the build cache to
a working directory in the jobs execution environment.
6. The job script then runs the ``sign.sh`` script built into the
notary Docker image.
7. The ``sign.sh`` script imports the public components of the
Reputational and Intermediate CI Keys and uses them to verify good
signatures on the spec.json.sig files. If any signed spec does not
verify the job immediately fails.
8. Assuming all specs are verified, the ``sign.sh`` script then unpacks
the spec json data from the signed file in preparation for being
signatures on the spec.manifest.json files. If any signed manifest
does not verify, the job immediately fails.
8. Assuming all manifests are verified, the ``sign.sh`` script then unpacks
the manifest json data from the signed file in preparation for being
re-signed with the Reputational Key.
9. The private components of the Reputational Key are decrypted to
standard out using ``aws-encryption-cli`` directly into a ``gpg
import …`` statement which imports the key into the
keyring mounted in-memory.
10. The private key is then used to sign each of the json specs and the
10. The private key is then used to sign each of the manifests and the
keyring is removed from disk.
11. The re-signed json specs are resynced to the AWS S3 Mirror and the
11. The re-signed manifests are resynced to the AWS S3 Mirror and the
public signing of the packages for the develop or release pipeline
that created them is complete.