Spec: use short-circuiting, stable comparison

## Background Spec comparison on develop used a somewhat questionable optimization to get decent spec comparison performance -- instead of comparing entire spec DAGs, it put a `hash()` call in `_cmp_iter()` and compared specs by their runtime hashes. This gets us good performance abstract specs, which don't have complex dependencies and for which hashing is cheap. But it makes the order of specs unstable and hard to reproduce. We really need to do a full, consistent traversal over specs to compare and to get a stable ordering. Simply taking the hash out and yielding dependencies recursively (i.e. yielding `dep.spec._cmp_iter()` instead of a hash) goes exponential for concrete specs because it explores all paths. Traversal tracks visited nodes, but it's expensive to set up the data structures for that, and it can slow down comparison of simple abstract specs. Abstract spec comparison performance is important for concretization (specifically setup), so we don't want to do that. ## New comparison algorithm We can have (mostly) the best of both worlds -- it's just a bit more complicated. This changes Spec comparison to do a proper, stable graph comparison: 1. Spec comparison will now short-circuit whenever possible for concrete specs, when DAG hashes are known to be equal or not equal. This means that concrete spec `==` and `!=` comparisons will no longer have to traverse the whole DAG. 2. Spec comparison now traverses the graph consistently, comparing nodes and edges in breadth-first order. This means Spec sort order is stable, and it won't vary arbitrarily from run to run. 3. Traversal can be expensive, so we avoid it for simple specs. Specifically, if a spec has no dependencies, or if its dependencies have no dependencies, we avoid calling `traverse_edges()` by doing some special casing. The `_cmp_iter` method for `Spec` now iterates over the DAG and yields nodes in BFS order. While it does that, it generates consistent ids for each node, based on traversal order. It then outputs edges in terms of these ids, along with their depflags and virtuals, so that all parts of the Spec DAG are included. The resulting total ordering of specs keys on node attributes first, then dependency nodes, then any edge differences between graphs. Optimized cases skip the id generation and traversal, since we know the order and therefore the ids in advance. ## Performance ramifications ### Abstract specs This seems to add around 7-8% overhead to concretization setup time. It's worth the cost, because this enables concretization caching (as input to concretization was previously not stable) and setup will eventually be parallelized, at which point it will no longer be a bottleneck for solving. Together those two optimizations will cut well over 50% of the time (likely closer to 90+%) off of most solves. ### Concrete specs Comparison for concrete specs is faster than before, sometimes *way* faster because comparison is now guaranteed to be linear time w.r.t. DAG size. Times for comparing concrete Specs: ```python def compare(json): a = spack.spec.Spec(json) b = spack.spec.Spec(json) print(a == b) print(timeit.timeit(lambda: a == b, number=1)) compare("./py-black.json") compare("./hdf5.json") ``` * `develop` (uses our prior hash optimization): * `py-black`: 7.013e-05s * `py-hdf5`: 6.445e-05s * `develop` with full traversal and no hash: * `py-black`: 3.955s * `py-hdf5`: 0.0122s * This branch (full traversal, stable, short-circuiting, no hash) * `py-black`: 2.208e-06s * `py-hdf5`: 3.416e-06s Signed-off-by: Todd Gamblin <tgamblin@llnl.gov>
2025-03-15 16:23:13 -07:00 · 2025-03-15 16:23:13 -07:00 · 8363fbf40f
commit 8363fbf40f
parent e4b898f7c3
2 changed files with 290 additions and 14 deletions
--- a/lib/spack/spack/spec.py
+++ b/lib/spack/spack/spec.py
@ -2673,7 +2673,7 @@ def name_and_dependency_types(s: str) -> Tuple[str, dt.DepFlag]:
                return name, depflag

            def spec_and_dependency_types(
-                s: Union[Spec, Tuple[Spec, str]],
+                s: Union[Spec, Tuple[Spec, str]]
            ) -> Tuple[Spec, dt.DepFlag]:
                """Given a non-string key in the literal, extracts the spec
                and its dependency types.
@ -3769,26 +3769,168 @@ def eq_node(self, other):
        """Equality with another spec, not including dependencies."""
        return (other is not None) and lang.lazy_eq(self._cmp_node, other._cmp_node)

-    def _cmp_iter(self):
-        """Lazily yield components of self for comparison."""
-
-        for item in self._cmp_node():
-            yield item
-
+    def _cmp_fast_eq(self, other) -> Optional[bool]:
+        """Short-circuit compare with other for equality, for lazy_lexicographic_ordering."""
        # If there is ever a breaking change to hash computation, whether accidental or purposeful,
        # two specs can be identical modulo DAG hash, depending on what time they were concretized
        # From the perspective of many operation in Spack (database, build cache, etc) a different
        # DAG hash means a different spec. Here we ensure that two otherwise identical specs, one
        # serialized before the hash change and one after, are considered different.
-        yield self.dag_hash() if self.concrete else None
+        if self is other:
+            return True

-        def deps():
-            for dep in sorted(itertools.chain.from_iterable(self._dependencies.values())):
-                yield dep.spec.name
-                yield dep.depflag
-                yield hash(dep.spec)
+        if self.concrete and other and other.concrete:
+            return self.dag_hash() == other.dag_hash()

-        yield deps
+        return None
+
+    def _cmp_iter(self):
+        """Lazily yield components of self for comparison."""
+
+        # Spec comparison in Spack needs to be fast, so there are several cases here for
+        # performance. The main places we care about this are:
+        #
+        #   * Abstract specs: there are lots of abstract specs in package.py files,
+        #     which are put into metadata dictionaries and sorted during concretization
+        #     setup. We want comparing abstract specs to be fast.
+        #
+        #   * Concrete specs: concrete specs are bigger and have lots of nodes and
+        #     edges. Because of the graph complexity, we need a full, linear time
+        #     traversal to compare them -- that's pretty much is unavoidable. But they
+        #     also have precoputed cryptographic hashes (dag_hash()), which we can use
+        #     to do fast equality comparison. See _cmp_fast_eq() above for the
+        #     short-circuit logic for hashes.
+        #
+        # A full traversal involves constructing data structurs, visitor objects, etc.,
+        # and it can be expensive if we have to do it to compare a bunch of tiny
+        # abstract specs. Therefore, there are 3 cases below, which avoid calling
+        # `spack.traverse.traverse_edges()` unless necessary.
+        #
+        # WARNING: the cases below need to be consistent, so don't mess with this code
+        # unless you really know what you're doing. Be sure to keep all three consistent.
+        #
+        # All cases lazily yield:
+        #
+        #   1. A generator over nodes
+        #   2. A generator over canonical edges
+        #
+        # Canonical edges have consistent ids defined by breadth-first traversal order. That is,
+        # the fake edge from nothing to the root is (0, 1), the root is always 1, dependencies of
+        # the root are 2, 3, 4, 5, etc., and so on.
+        #
+        # The three cases are:
+        #
+        #   1. Spec has no dependencies
+        #      * We can avoid any traversal logic and just yield this node's _cmp_node generator.
+        #
+        #   2. Spec has dependencies, but dependencies have no dependencies.
+        #      * We need to sort edges, but we don't need to track visited nodes, which
+        #        can save us the cost of setting up all the tracking data structures
+        #        `spack.traverse` uses.
+        #
+        #   3. Spec has dependencies that have dependencies.
+        #      * In this case, the spec is *probably* concrete. Equality comparisons
+        #        will be short-circuited by dag_hash(), but other comparisons will need
+        #        to lazily enumerate components of the spec. The traversal logic is
+        #        unavoidable.
+        #
+        # TODO: consider reworking `spack.traverse` to construct fewer data structures
+        # and objects, as this would make all traversals faster and could eliminate the
+        # need for the complexity here. It was not clear at the time of writing that how
+        # much optimization was possible in `spack.traverse`.
+
+        def yield_root_edge():
+            yield 0  # id of None (parent of root)
+            yield 1  # id of root
+            yield 0  # depflag on fake edge
+            yield ()  # virtuals on fake edge
+
+        if not self._dependencies:
+            # Spec has no dependencies -- simplest case
+            def nodes():
+                yield self._cmp_node  # just yield this node
+
+            def edges():
+                yield yield_root_edge
+
+        elif not any(
+            dep.spec._dependencies for deplist in self._dependencies.values() for dep in deplist
+        ):
+            # Spec has dependencies, but none of them have any dependencies. Avoid calling traverse
+            # need to track visited nodes or do any of the other fancy and expensive
+            # things traverse_edges does.
+            sorted_edges = None
+
+            def nodes():
+                # We generate sorted_edges once, in nodes, lazily. If the comparison can
+                # end with just _cmp_node, we never have to sort.
+                nonlocal sorted_edges
+
+                # yield the root node comparator
+                yield self._cmp_node
+
+                # sort only if roots were equal and caller needs to look at deps
+                sorted_edges = spack.traverse.sort_edges(
+                    self.edges_to_dependencies(depflag=dt.ALL)
+                )
+
+                # yield dependency node comparators in BFS order
+                for edge in sorted_edges:
+                    yield edge.spec._cmp_node
+
+            def edges():
+                # yield root edge consistent with what traverse would do.
+                yield yield_root_edge
+
+                # we've already sorted to yield dependency nodes in order, now yield
+                # edges for comparison in the same order, with BFS-consistent ids
+                for i, edge in enumerate(sorted_edges, start=2):
+
+                    def yield_edge(i=i, edge=edge):
+                        yield 1  # id of root
+                        yield i  # id of dependency in BFS order
+                        yield edge.depflag
+                        yield edge.virtuals
+
+                    yield yield_edge
+
+        else:
+            # Spec has dependencies, and they have dependencies. Need to do a full traversal here.
+
+            # We need a way to compare edges consistently between two arbitrary specs.
+            # node_ids generates consistent node ids based on BFS traversal order.
+            node_ids = collections.defaultdict(lambda: len(node_ids))
+            node_ids[id(None)]  # None (parent of root) is always node id 0
+
+            # To avoid traversing twice, we put edges in this list as we yield the nodes.
+            # edges() yields the edge functions later, if necessary.
+            edge_list = []
+
+            def nodes():
+                # Breadth-first edge traversal, yielding nodes as they're encountered.
+                for edge in self.traverse_edges(order="breadth", cover="edges", deptype=dt.ALL):
+                    # yield each node only once, and generate a consistent id for it the
+                    # first time it's encountered.
+                    if id(edge.spec) not in node_ids:
+                        yield edge.spec._cmp_node
+                        node_ids[id(edge.spec)]
+
+                    # create "edges" with the ids we generated above
+                    def yield_edge(edge=edge):
+                        yield node_ids[id(edge.parent)]
+                        yield node_ids[id(edge.spec)]
+                        yield edge.depflag
+                        yield edge.virtuals
+
+                    edge_list.append(yield_edge)
+
+            def edges():
+                # yield edges in the order they were encountered during traversal
+                for yield_edge_func in edge_list:
+                    yield yield_edge_func
+
+        yield nodes
+        yield edges

    @property
    def namespace_if_anonymous(self):
--- a/lib/spack/spack/test/spec_semantics.py
+++ b/lib/spack/spack/test/spec_semantics.py
@ -6,6 +6,8 @@

 import pytest

+import llnl.util.lang
+
 import spack.concretize
 import spack.deptypes as dt
 import spack.directives
@ -2013,6 +2015,138 @@ def test_comparison_multivalued_variants():
    assert Spec("x=a") < Spec("x=a,b") < Spec("x==a,b") < Spec("x==a,b,c")


+@pytest.mark.parametrize(
+    "specs_in_expected_order",
+    [
+        ("a", "b", "c", "d", "e"),
+        ("a@1.0", "a@2.0", "b", "c@3.0", "c@4.0"),
+        ("a^d", "b^c", "c^b", "d^a"),
+        ("e^a", "e^b", "e^c", "e^d"),
+        ("e^a@1.0", "e^a@2.0", "e^a@3.0", "e^a@4.0"),
+        ("e^a@1.0 +a", "e^a@1.0 +b", "e^a@1.0 +c", "e^a@1.0 +c"),
+        ("a^b%c", "a^b%d", "a^b%e", "a^b%f"),
+        ("a^b%c@1.0", "a^b%c@2.0", "a^b%c@3.0", "a^b%c@4.0"),
+        ("a^b%c@1.0 +a", "a^b%c@1.0 +b", "a^b%c@1.0 +c", "a^b%c@1.0 +d"),
+        ("a cflags=-O1", "a cflags=-O2", "a cflags=-O3"),
+        ("a %cmake@1.0 ^b %cmake@2.0", "a %cmake@2.0 ^b %cmake@1.0"),
+        ("a^b^c^d", "a^b^c^e", "a^b^c^f"),
+        ("a^b^c^d", "a^b^c^e", "a^b^c^e", "a^b^c^f"),
+        ("a%b%c%d", "a%b%c%e", "a%b%c%e", "a%b%c%f"),
+        ("d.a", "c.b", "b.c", "a.d"),  # names before namespaces
+    ],
+)
+def test_spec_ordering(specs_in_expected_order):
+    specs_in_expected_order = [Spec(s) for s in specs_in_expected_order]
+    assert sorted(specs_in_expected_order) == specs_in_expected_order
+    assert sorted(reversed(specs_in_expected_order)) == specs_in_expected_order
+
+    for i in range(len(specs_in_expected_order) - 1):
+        lhs, rhs = specs_in_expected_order[i : i + 2]
+        assert lhs <= rhs
+        assert (lhs < rhs and lhs != rhs) or lhs == rhs
+        assert rhs >= lhs
+        assert (rhs > lhs and rhs != lhs) or rhs == lhs
+
+
+EMPTY_VER = vn.VersionList(":")
+EMPTY_VAR = Spec().variants
+EMPTY_FLG = Spec().compiler_flags
+
+
+@pytest.mark.parametrize(
+    "spec,expected_tuplified",
+    [
+        # simple, no dependencies
+        [
+            ("a"),
+            ((("a", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),), ((0, 1, 0, ()),)),
+        ],
+        # with some node attributes
+        [
+            ("a@1.0 +foo cflags='-O3 -g'"),
+            (
+                (
+                    (
+                        "a",
+                        None,
+                        vn.VersionList(["1.0"]),
+                        Spec("+foo").variants,
+                        Spec("cflags='-O3 -g'").compiler_flags,
+                        None,
+                        None,
+                        None,
+                    ),
+                ),
+                ((0, 1, 0, ()),),
+            ),
+        ],
+        # single edge case
+        [
+            ("a^b"),
+            (
+                (
+                    ("a", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("b", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                ),
+                ((0, 1, 0, ()), (1, 2, 0, ())),
+            ),
+        ],
+        # root with multiple deps
+        [
+            ("a^b^c^d"),
+            (
+                (
+                    ("a", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("b", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("c", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("d", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                ),
+                ((0, 1, 0, ()), (1, 2, 0, ()), (1, 3, 0, ()), (1, 4, 0, ())),
+            ),
+        ],
+        # root with multiple build deps
+        [
+            ("a%b%c%d"),
+            (
+                (
+                    ("a", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("b", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("c", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("d", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                ),
+                ((0, 1, 0, ()), (1, 2, dt.BUILD, ()), (1, 3, dt.BUILD, ()), (1, 4, dt.BUILD, ())),
+            ),
+        ],
+        # dependencies with dependencies
+        [
+            ("a  ^b %c %d  ^e %f %g"),
+            (
+                (
+                    ("a", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("b", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("e", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("c", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("d", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("f", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                    ("g", None, EMPTY_VER, EMPTY_VAR, EMPTY_FLG, None, None, None),
+                ),
+                (
+                    (0, 1, 0, ()),
+                    (1, 2, 0, ()),
+                    (1, 3, 0, ()),
+                    (2, 4, dt.BUILD, ()),
+                    (2, 5, dt.BUILD, ()),
+                    (3, 6, dt.BUILD, ()),
+                    (3, 7, dt.BUILD, ()),
+                ),
+            ),
+        ],
+    ],
+)
+def test_spec_canonical_comparison_form(spec, expected_tuplified):
+    assert llnl.util.lang.tuplify(Spec(spec)._cmp_iter) == expected_tuplified
+
+
 def test_comparison_after_breaking_hash_change():
    # We simulate a breaking change in DAG hash computation in Spack. We have two specs that are
    # entirely equal modulo DAG hash. When deserializing these specs, we don't want them to compare