Setting the undocumented variable SPACK_CONCRETIZER_REQUIRE_CHECKSUM
now causes the solver to avoid accounting for versions that are not checksummed.
This feature is used in CI to avoid spurious concretization against e.g. develop branches.
Refactor gitlab ci configs so that mac and cray jobs can reuse as much higher level
configuration as possible.
* CI: remove redundant sections
* CI: Include base linux CI configs in cray stacks
Relocation and runner mapping is consistent between cray and linux runners.
* Export user cache path in before script
* CI: add GPG root for mac runners
* Disable user configs
Metal runners share a ~ directory
* Disable user config and add configs in activate env
The pcluster image has am internal buildcache without an index.
Also, we need to force reuse to avoid rebuilding GCC, since the default is
to only reuse dependencies - and that is subject to changes in the GCC
recipe.
* e4s oneapi ci: use official intel oneapi-derived runner image
* update oneapi image
* tau builds ok, but only with libdrm - comment out for now, follow up with pr later
* e4s cray ci stack
* e4s ci: add cray
* add zen4 tag
* WIP: new defintions just for cray
* updates
* remove ci signing job overrride, not necessary
* echo $PATH and show modules loaded
* add mirror
* add external def for cray-libsci
* comment out quantum-espresso
* use /etc/protected-runner as key path
* cray ci stack: do not remove tags: [spack, public]
* make cray stack composable
* generate job should run on public tagged runner, override default config:install_tree:root
* CI: Use relative path in default script
* CI: Use relative includes paths for shell runners
* Use concrete_env_dir for relpath
* ml-darwin-aarch64-mps: jax has bazel codesign issue
---------
Co-authored-by: Scott Wittenburg <scott.wittenburg@kitware.com>
Co-authored-by: Ryan Krattiger <ryan.krattiger@kitware.com>
* Add macOS ML CI stacks
* torchmeta is no longer maintained and requires ancient PyTorch
* Add MXNet
* update darwin aarch64 stacks
* add darwin-aarch64 scoped config.yaml
* remove unnecessary cleanup job
* fix specifications
* fix labels
* fix labels
* fix indent on tags specification
* no tags for trigger jobs
* try overriding tags in stack spack.yaml
* do not use CI_STACK_CONFIG_SCOPES
* incorporate config:install_tree:root: overrides and compiler defs
* copy relevant ci-scoped config settings directly into stack spack.yaml
* remove build-job-remove
* spack ci generate: add debug flag
* include cdash config directly in stack spack.yaml
* customize build-job script section to avoid absolute paths
* add any-job specification
* tags: use aarch64-macos instead of aarch64
* generate tags: use aarch64-macos instead of aarch64
* do not add morepadding
* use shared mirror; comment out known failures
* remove any-job
* nproc || true
* comment out specs failing due to bazel from cache codesign issue
---------
Co-authored-by: eugeneswalker <eugenesunsetwalker@gmail.com>
* [pcluster pipeline] Use local buildcache instead of upstream spack
Spack currently does not relocate compiler references from upstream spack
installations. When using a buildcache we don't need an upstream spack.
* gcc needs to be installed via postinstall to get correct deps
* quantum-espresso@gcc@12.3.0 returns ICE on neoverse_{n,v}1
* Force gitlab to pull the new container
* Revert "Force gitlab to pull the new container"
This reverts commit 3af5f4cd88.
Seems the gitlab version does not yet support "pull_policy" in .gitlab-ci.yml
* Gitlab keeps picking up wrong container. Renaming
* Update containers once more after failed build
Add aws-plcuster[-aarch64] stacks. These stacks build packages defined in
https://github.com/spack/spack-configs/tree/main/AWS/parallelcluster
They use a custom container from https://github.com/spack/gitlab-runners which
includes necessary ParallelCluster software to link and build as well as an
upstream spack installation with current GCC and dependencies.
Intel and ARM software is installed and used during the build stage but removed
from the buildcache before the signing stage.
Files `configs/linux/{arch}/ci.yaml` select the necessary providers in order to
build for specific architectures (icelake, skylake, neoverse_{n,v}1).
* gitlab ci: release fixes and improvements
- use rules to reduce boilerplate in .gitlab-ci.yml
- support copy-only pipeline jobs
- make pipelines for release branches rebuild everything
- make pipelines for protected tags copy-only
* gitlab ci: remove url changes used in testing
* gitlab ci: tag mirrors need public key
Make sure that mirrors associated with release branches and tags
contain the public key needed to verify the signed binaries. This
also ensures that when stack-specific mirror contents are copied
to the root, the root mirror has the public key as well.
* review: be more specific about tags, curl flags
* Make the check in ci.yaml consistent with the .gitlab-ci.yml
---------
Co-authored-by: Ryan Krattiger <ryan.krattiger@kitware.com>
The flags --mirror-name / --mirror-url / --directory were deprecated in
favor of just passing a positional name, url or directory, and letting spack
figure it out.
---------
Co-authored-by: Scott Wittenburg <scott.wittenburg@kitware.com>
* CI: Fixup docs for bootstrap.
* CI: Add compatibility shim
* Add an update method for CI
Update requires manually renaming section to `ci`. After
this patch, updating and using the deprecated `gitlab-ci` section
should be possible.
* Fix typos in generate warnings
* Fixup CI schema validation
* Add unit tests for legacy CI
* Add deprecated CI stack for continuous testing
* Allow updating gitlab-ci section directly with env update
* Make warning give good advice for updating gitlab-ci
* Fix typo in CI name
* Remove white space
* Remove unneeded component of deprected-ci
* ci: version bump for ghcr.io/spack/e4s-amazonlinux-2
This new image comes with GnuPG v2.4.0
* py-cython: upperbounds for Python versions
* fix py-gevent nonsense
---------
Co-authored-by: Harmen Stoppels <me@harmenstoppels.nl>
* CI configuration boilerplate reduction and refactor
Configuration:
- New notation for list concatenation (prepend/append)
- New notation for string concatenation (prepend/append)
- Break out configuration files for: ci.yaml, cdash.yaml, view.yaml
- Spack CI section refactored to improve self-consistency and
composability
- Scripts are now lists of lists and/or lists of strings
- Job attributes are now listed under precedence ordered list that are
composed/merged using Spack config merge rules.
- "service-jobs" are identified explicitly rather than as a batch
CI:
- Consolidate common, platform, and architecture configurations for all CI stacks into composable configuration files
- Make padding consistent across all stacks (256)
- Merge all package -> runner mappings to be consistent across all
stacks
Unit Test:
- Refactor CI module unit-tests for refactor configuration
Docs:
- Add docs for new notations in configuration.rst
- Rewrite docs on CI pipelines to be consistent with refactored CI
workflow
* Script verbose environ, dev bootstrap
* Port #35409
By setting the traversal depth to 1, only specs matching the changed
package and direct dependents of those (and of course all dependencies
of that set) are removed from pruning candidacy.
* e4s: restore builds builds
* gitlab ci: allow UO to build protected binaries for signing
* use newer image; comment out failing builds
* gitlab-ci: Some tweaks for e4s power builds
- fix tags (no longer require generate jobs to run on aws)
- fix resource requests for generation jobs resource requests
- remove SPACK_SIGNING_KEY from protected power build jobs
- update UO signing key path
- change the CDash build group to reflect stack name
- retry pipeline generation jobs *always*
* correct double packages: section
* gitlab-ci:script: modernize
* remove new gnu make, not for ppc64le
---------
Co-authored-by: Scott Wittenburg <scott.wittenburg@kitware.com>
Gitlab does not merge lists when a job extends two other definitions
that include the same list (e.g. tags). Also, it merges dictionaries
as long as the keys are distinct, but just takes the last mentioned
value when there are key collisions.
This change makes sure that when different tags are needed by a
pipeline, the ones we want are actually provided. It also changes
the example stack to better follow this pattern so we do not lead
developers astray in the future.
* ML CI: Linux x86_64
* Update comments
* Rename again
* Rename comments
* Update to match other arches
* No compiler
* Compiler was wrong anyway
* Faster TF
* CI: Update Data and Vis SDK Stack
* Update image to match target deployments (E4S)
* Enable all packages
* Test supported variants of ParaView and VisIt
* Sensei: Update Python hint for newer cmake
* Sensei: add Python3 hint
When we lose a running pod (possibly loss of spot instance) or encounter
some other infrastructure-related failure of this job, we need to retry
it. This retries the job the maximum number of times in those cases.
Basic stack of ML packages we would like to test and generate binaries for in CI.
Spack now has a large CI framework in GitLab for PR testing and public binary generation.
We should take advantage of this to test and distribute optimized binaries for popular ML
frameworks.
This is a pretty extensive initial set, including CPU, ROCm, and CUDA versions of a core
`x96_64_v4` stack.
### Core ML frameworks
These are all popular core ML frameworks already available in Spack.
- [x] PyTorch
- [x] TensorFlow
- [x] Scikit-learn
- [x] MXNet
- [x] CNTK
- [x] Caffe
- [x] Chainer
- [x] XGBoost
- [x] Theano
### ML extensions
These are domain libraries and wrappers that build on top of core ML libraries
- [x] Keras
- [x] TensorBoard
- [x] torchvision
- [x] torchtext
- [x] torchaudio
- [x] TorchGeo
- [x] PyTorch Lightning
- [x] torchmetrics
- [x] GPyTorch
- [x] Horovod
### ML-adjacent libraries
These are libraries that aren't specific to ML but are still core libraries used in ML pipelines
- [x] numpy
- [x] scipy
- [x] pandas
- [x] ONNX
- [x] bazel
Co-authored-by: Jonathon Anderson <17242663+blue42u@users.noreply.github.com>
Move the copying of the buildcache to a root job that runs after all the child
pipelines have finished, so that the operation can be coordinated across all
child pipelines to remove the possibility of race conditions during potentially
simlutandous copies. This lets us ensure the .spec.json.sig and .spack files
for any spec in the root mirror always come from the same child pipeline
mirror (though which pipeline is arbitrary). It also allows us to avoid copying
of duplicates, which we now do.
On PR pipelines we need to override the buildcache destination to
point to the "spack-binaries-prs" bucket, otherwise, those pipelines
try to push to the default mirror in a bucket for which they don't
have write permission.
Add spack stacks targeted at Spack + AWS + ARM HPC User Group hackathon. Includes
a list of miniapps and full-apps that are ready to run on both x86_64 and aarch64.
Co-authored-by: Scott Wittenburg <scott.wittenburg@kitware.com>
Add two new stacks targeted at x86_64 and arm, representing an initial list of packages
used by current and planned AWS Workshops, and built in conjunction with the ISC22
announcement of the spack public binary cache.
Co-authored-by: Scott Wittenburg <scott.wittenburg@kitware.com>
This PR supports the creation of securely signed binaries built from spack
develop as well as release branches and tags. Specifically:
- remove internal pr mirror url generation logic in favor of buildcache destination
on command line
- with a single mirror url specified in the spack.yaml, this makes it clearer where
binaries from various pipelines are pushed
- designate some tags as reserved: ['public', 'protected', 'notary']
- these tags are stripped from all jobs by default and provisioned internally
based on pipeline type
- update gitlab ci yaml to include pipelines on more protected branches than just
develop (so include releases and tags)
- binaries from all protected pipelines are pushed into mirrors including the
branch name so releases, tags, and develop binaries are kept separate
- update rebuild jobs running on protected pipelines to run on special runners
provisioned with an intermediate signing key
- protected rebuild jobs no longer use "SPACK_SIGNING_KEY" env var to
obtain signing key (in fact, final signing key is nowhere available to rebuild jobs)
- these intermediate signatures are verified at the end of each pipeline by a new
signing job to ensure binaries were produced by a protected pipeline
- optionallly schedule a signing/notary job at the end of the pipeline to sign all
packges in the mirror
- add signing-job-attributes to gitlab-ci section of spack environment to allow
configuration
- signing job runs on special runner (separate from protected rebuild runners)
provisioned with public intermediate key and secret signing key
Add two new cloud pipelines for E4S on Amazon Linux, include arm and x86 (v3 + v4) stacks.
Notes:
- Updated mpark-variant to remove conflict that no longer exists in Amazon Linux
- Which command on Amazon Linux prefixes on all results when padded_length is too high. In this case, padded_length<=503 works as expected. Chose conservative length of 384.
We've previously generated CI pipelines for PRs, and they rebuild any packages that don't have
a binary in an existing build cache. The assumption we were making was that ALL prior merged
builds would be in cache, but due to the way we do security in the pipeline, they aren't. `develop`
pipelines can take a while to catch up with the latest PRs, and while it does that, there may be a
bunch of redundant builds on PRs that duplicate things being rebuilt on `develop`. Until we can
do better caching of PR builds, we'll have this problem.
We can do better in PRs, though, by *only* rebuilding things in the CI environment that are actually
touched by the PR. This change computes exactly what packages are changed by a PR branch and
*only* includes those packages' dependents and dependencies in the generated pipeline. Other
as-yet unbuilt packages are pruned from CI for the PR.
For `develop` pipelines, we still want to build everything to ensure that the stack works, and to ensure
that `develop` catches up with PRs. This is especially true since we do not do rebuilds for *every* commit
on `develop` -- just the most recent one after each `develop` pipeline finishes. Since we skip around,
we may end up missing builds unless we ensure that we rebuild everything.
We differentiate between `develop` and PR pipelines in `.gitlab-ci.yml` by setting
`SPACK_PRUNE_UNTOUCHED` for PRs. `develop` will still have the old behavior.
- [x] Add `SPACK_PRUNE_UNTOUCHED` variable to `spack ci`
- [x] Refactor `spack pkg` command by moving historical package checking logic to `spack.repo`
- [x] Implement pruning logic in `spack ci` to remove untouched packages
- [x] add tests
Modifications:
- [x] Change `defaults/config.yaml`
- [x] Add a fix for bootstrapping patchelf from sources if `compilers.yaml` is empty
- [x] Make `SPACK_TEST_SOLVER=clingo` the default for unit-tests
- [x] Fix package failures in the e4s pipeline
Caveats:
1. CentOS 6 still uses the original concretizer as it can't connect to the buildcache due to issues with `ssl` (bootstrapping from sources requires a C++14 capable compiler)
1. I had to update the image tag for GitlabCI in e699f14.
1. libtool v2.4.2 has been deprecated and other packages received some update
Modifications:
- Remove the "build tests" workflow from GitHub Actions
- Setup a similar e2e test on Gitlab
In this way we'll reduce load on GitHub Actions workflows and for e2e tests will
benefit from the buildcache reuse granted by pipelines.
Spack pipelines need to take specific actions internally that depend
on whether the pipeline is being run on a PR to spack or a merge to
the develop branch. Pipelines can also run in other repositories,
which represents other possible use cases than just the two mentioned
above. This PR creates a "SPACK_PIPELINE_TYPE" gitlab variable which
is propagated to rebuild jobs, and is also used internally to determine
which pipeline-specific tasks to run.
One goal of the PR is fix an issue where rebuild jobs which failed on
develop pipelines did not properly report the broken full hash to the
"broken-specs-url".
### Overview
The goal of this PR is to make gitlab pipeline builds (especially build failures) more reproducible outside of the pipeline environment. The two key changes here which aim to improve reproducibility are:
1. Produce a `spack.lock` during pipeline generation which is passed to child jobs via artifacts. This concretized environment is used both by generated child jobs as well as uploaded as an artifact to be used when reproducing the build locally.
2. In the `spack ci rebuild` command, if a spec needs to be rebuilt from source, do this by generating and running an `install.sh` shell script which is then also uploaded as a job artifact to be run during local reproduction.
To make it easier to take advantage of improved build reproducibility, this PR also adds a new subcommand, `spack ci reproduce-build`, which, given a url to job artifacts:
- fetches and unzips the job artifacts to a local directory
- looks for the generated pipeline yaml and parses it to find details about the job to reproduce
- attempts to provide a copy of the same version of spack used in the ci build
- if the ci build used a docker image, the command prints a `docker run` command you can run to get an interactive shell for reproducing the build
#### Some highlights
One consequence of this change will be much smaller pipeline yaml files. By encoding the concrete environment in a `spack.lock` and passing to child jobs via artifacts, we will no longer need to encode the concrete root of each spec and write it into the job variables, greatly reducing the size of the generated pipeline yaml.
Additionally `spack ci rebuild` output (stdout/stderr) is no longer internally redirected to a log file, so job output will appear directly in the gitlab job trace. With debug logging turned on, this often results in log files getting truncated because they exceed the maximum amount of log output gitlab allows. If this is a problem, you still have the option to `tee` command output to a file in the within the artifacts directory, as now each generated job exposes a `user_data` directory as an artifact, which you can fill with whatever you want in your custom job scripts.
There are some changes to be aware of in how pipelines should be set up after this PR:
#### Pipeline generation
Because the pipeline generation job now writes a `spack.lock` artifact to be consumed by generated downstream jobs, `spack ci generate` takes a new option `--artifacts-root`, inside which it creates a `concrete_env` directory to place the lockfile. This artifacts root directory is also where the `user_data` directory will live, in case you want to generate any custom artifacts. If you do not provide `--artifacts-root`, the default is for it to create a `jobs_scratch_dir` within your `CI_PROJECT_DIR` (a gitlab predefined environment variable) or whatever is your current working directory if that variable isn't set. Here's the diff of the PR testing `.gitlab-ci.yml` taking advantage of the new option:
```
$ git diff develop..pipelines-reproducible-builds share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
diff --git a/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml b/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
index 579d7b56f3..0247803a30 100644
--- a/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
+++ b/share/spack/gitlab/cloud_pipelines/.gitlab-ci.yml
@@ -28,10 +28,11 @@ default:
- cd share/spack/gitlab/cloud_pipelines/stacks/${SPACK_CI_STACK_NAME}
- spack env activate --without-view .
- spack ci generate --check-index-only
+ --artifacts-root "${CI_PROJECT_DIR}/jobs_scratch_dir"
--output-file "${CI_PROJECT_DIR}/jobs_scratch_dir/cloud-ci-pipeline.yml"
artifacts:
paths:
- - "${CI_PROJECT_DIR}/jobs_scratch_dir/cloud-ci-pipeline.yml"
+ - "${CI_PROJECT_DIR}/jobs_scratch_dir"
tags: ["spack", "public", "medium", "x86_64"]
interruptible: true
```
Notice how we replaced the specific pointer to the generated pipeline file with its containing folder, the same folder we passed as `--artifacts-root`. This way anything in that directory (the generated pipeline yaml, as well as the concrete environment directory containing the `spack.lock`) will be uploaded as an artifact and available to the downstream jobs.
#### Rebuild jobs
Rebuild jobs now must activate the concrete environment created by `spack ci generate` and provided via artifacts. When the pipeline is generated, a directory called `concrete_environment` is created within the artifacts root directory, and this is where the `spack.lock` file is written to be passed to the generated rebuild jobs. The artifacts root directory can be specified using the `--artifacts-root` option to `spack ci generate`, otherwise, it is assumed to be `$CI_PROJECT_DIR`. The directory containing the concrete environment files (`spack.yaml` and `spack.lock`) is then passed to generated child jobs via the `SPACK_CONCRETE_ENV_DIR` variable in the generated pipeline yaml file.
When you don't provide custom `script` sections in your `mappings` within the `gitlab-ci` section of your `spack.yaml`, the default behavior of rebuild jobs is now to change into `SPACK_CONCRETE_ENV_DIR` and activate that environment. If you do provide custom rebuild scripts in your `spack.yaml`, be aware those scripts should do the same thing: assume `SPACK_CONCRETE_ENV_DIR` contains the concretized environment to activate. No other changes to existing custom rebuild scripts should be required as a result of this PR.
As mentioned above, one key change made in this PR is the generation of the `install.sh` script by the rebuild jobs, as that same script is both run by the CI rebuild job as well as exported as an artifact to aid in subsequent attempts to reproduce the build outside of CI. The generated `install.sh` script contains only a single `spack install` command with arguments computed by `spack ci rebuild`. If the install fails, the job trace in gitlab will contain instructions on how to reproduce the build locally:
```
To reproduce this build locally, run:
spack ci reproduce-build https://gitlab.next.spack.io/api/v4/projects/7/jobs/240607/artifacts [--working-dir <dir>]
If this project does not have public pipelines, you will need to first:
export GITLAB_PRIVATE_TOKEN=<generated_token>
... then follow the printed instructions.
```
When run locally, the `spack ci reproduce-build` command shown above will download and process the job artifacts from gitlab, then print out instructions you can copy-paste to run a local reproducer of the CI job.
This PR includes a few other changes to the way pipelines work, see the documentation on pipelines for more details.
This PR erelies on
~- [ ] #23194 to be able to refer to uninstalled specs by DAG hash~
EDIT: that is going to take longer to come to fruition, so for now, we will continue to install specs represented by a concrete `spec.yaml` file on disk.
- [x] #22657 to support install a single spec already present in the active, concrete environment