spack/lib/spack
Todd Gamblin 8363fbf40f
Spec: use short-circuiting, stable comparison
## Background

Spec comparison on develop used a somewhat questionable optimization to
get decent spec comparison performance -- instead of comparing entire spec
DAGs, it put a `hash()` call in `_cmp_iter()` and compared specs by their
runtime hashes. This gets us good performance abstract specs, which don't
have complex dependencies and for which hashing is cheap. But it makes
the order of specs unstable and hard to reproduce.

We really need to do a full, consistent traversal over specs to compare
and to get a stable ordering. Simply taking the hash out and yielding
dependencies recursively (i.e. yielding `dep.spec._cmp_iter()` instead
of a hash) goes exponential for concrete specs because it explores all
paths. Traversal tracks visited nodes, but it's expensive to set up
the data structures for that, and it can slow down comparison of simple
abstract specs. Abstract spec comparison performance is important for
concretization (specifically setup), so we don't want to do that.

## New comparison algorithm

We can have (mostly) the best of both worlds -- it's just a bit more
complicated.

This changes Spec comparison to do a proper, stable graph comparison:

1. Spec comparison will now short-circuit whenever possible for concrete
   specs, when DAG hashes are known to be equal or not equal. This means
   that concrete spec `==` and `!=` comparisons will no longer have
   to traverse the whole DAG.

2. Spec comparison now traverses the graph consistently, comparing nodes
   and edges in breadth-first order. This means Spec sort order is stable,
   and it won't vary arbitrarily from run to run.

3. Traversal can be expensive, so we avoid it for simple specs. Specifically,
   if a spec has no dependencies, or if its dependencies have no dependencies,
   we avoid calling `traverse_edges()` by doing some special casing.

The `_cmp_iter` method for `Spec` now iterates over the DAG and yields nodes
in BFS order. While it does that, it generates consistent ids for each node,
based on traversal order. It then outputs edges in terms of these ids, along with
their depflags and virtuals, so that all parts of the Spec DAG are included.
The resulting total ordering of specs keys on node attributes first, then
dependency nodes, then any edge differences between graphs.

Optimized cases skip the id generation and traversal, since we know the
order and therefore the ids in advance.

## Performance ramifications

### Abstract specs

This seems to add around 7-8% overhead to concretization setup time. It's
worth the cost, because this enables concretization caching (as input to
concretization was previously not stable) and setup will eventually be
parallelized, at which point it will no longer be a bottleneck for solving.
Together those two optimizations will cut well over 50% of the time (likely
closer to 90+%) off of most solves.

### Concrete specs

Comparison for concrete specs is faster than before, sometimes *way* faster
because comparison is now guaranteed to be linear time w.r.t. DAG size.
Times for comparing concrete Specs:

```python
def compare(json):
    a = spack.spec.Spec(json)
    b = spack.spec.Spec(json)
    print(a == b)
    print(timeit.timeit(lambda: a == b, number=1))

compare("./py-black.json")
compare("./hdf5.json")
```

* `develop` (uses our prior hash optimization):
  * `py-black`: 7.013e-05s
  * `py-hdf5`: 6.445e-05s

* `develop` with full traversal and no hash:
  * `py-black`: 3.955s
  * `py-hdf5`: 0.0122s

* This branch (full traversal, stable, short-circuiting, no hash)
  * `py-black`: 2.208e-06s
  * `py-hdf5`: 3.416e-06s

Signed-off-by: Todd Gamblin <tgamblin@llnl.gov>
2025-05-19 15:59:03 -07:00
..
docs Move builders into builtin repo (#50452) 2025-05-18 20:31:20 -07:00
external Add a prefix when we import vendored modules (#50443) 2025-05-13 07:20:40 +02:00
llnl lazy_lexicographic_ordering: Add short-circuiting with _cmp_fast_eq 2025-05-19 15:59:03 -07:00
spack Spec: use short-circuiting, stable comparison 2025-05-19 15:59:03 -07:00
spack_installable Update sys.path references (#50466) 2025-05-14 10:25:00 +00:00