A flexible package manager that supports multiple versions, configurations, platforms, and compilers.
Go to file
Harmen Stoppels 0c0831861c
Avoid quadratic complexity in log parser (#26568)
TL;DR: there are matching groups trying to match 1 or more occurrences of
something. We don't use the matching group. Therefore it's sufficient to test
for 1 occurrence. This reduce quadratic complexity to linear time.

---

When parsing logs of an mpich build, I'm getting a 4 minute (!!) wait
with 16 threads for regexes to run:

```
In [1]: %time p.parse("mpich.log")
Wall time: 4min 14s
```

That's really unacceptably slow... 

After some digging, it seems a few regexes tend to have `O(n^2)` scaling
where `n` is the string / log line length. I don't think they *necessarily*
should scale like that, but it seems that way. The common pattern is this

```
([^:]+): error
```

which matches `: error` literally, and then one or more non-colons before that. So
for a log line like this:

```
abcdefghijklmnopqrstuvwxyz: error etc etc
```

Any of these are potential group matches when using `search` in Python:

```
abcdefghijklmnopqrstuvwxyz
 bcdefghijklmnopqrstuvwxyz
  cdefghijklmnopqrstuvwxyz
                         ⋮
                        yz
                         z
```

but clearly the capture group should return the longest match.

My hypothesis is that Python has a very bad implementation of `search`
that somehow considers all of these, even though it can be implemented
in linear time by scanning for `: error` first, and then greedily expanding
the longest possible `[^:]+` match to the left. If Python indeed considers
all possible matches, then with `n` matches of length `1 .. n` you
see the `O(n^2)` slowness (i verified this by replacing + with {1,k}
and doubling `k`, it doubles the execution time indeed).

This PR fixes this by removing the `+`, so effectively changing the 
O(n^2) into a O(n) worst case.

The reason we are fine with dropping `+` is that we don't use the
capture group anywhere, so, we just ensure `:: error` is not a match
but `x: error` is.

After going from O(n^2) to O(n), the 15MB mpich build log is parsed
in `1.288s`, so about 200x faster.

Just to be sure I've also updated `^CMake Error.*:` to `^CMake Error`,
so that it does not match with all the possible `:`'s in the line.
Another option is to use `.*?` there to make it quit scanning as soon as
possible, but what line that starts with `CMake Error` that does not have
a colon is really a false positive...
2021-10-12 00:05:11 -07:00
.github Set explicitly write permission for packages (#26539) 2021-10-05 23:12:25 +00:00
bin Use a patched argparse only in Python 2.X (#25376) 2021-08-17 08:52:51 -07:00
etc/spack/defaults installer: Support showing status information in terminal title (#16259) 2021-10-11 17:54:59 +02:00
lib/spack Avoid quadratic complexity in log parser (#26568) 2021-10-12 00:05:11 -07:00
share/spack Add spack env activate --temp (#25388) 2021-10-11 06:56:03 -04:00
var/spack py-templateflow: add 0.4.2 (#26471) 2021-10-11 21:37:50 -05:00
.codecov.yml codecov: allow coverage offsets for more base commit flexibility (#25293) 2021-08-06 01:33:12 -07:00
.dockerignore Docker: ignore var/spack/cache (source caches) when creating container (#23329) 2021-05-17 11:28:58 +02:00
.flake8 style: Move isort configuration to pyproject.toml 2021-07-07 17:27:31 -07:00
.gitattributes
.gitignore .gitignore needs to be below env and ENV for case-insensitive FS 2021-10-04 18:30:19 -07:00
.mailmap Update mailmap (#22739) 2021-04-06 10:32:35 +02:00
.readthedocs.yml More strict ReadTheDocs tests (#26580) 2021-10-08 09:27:17 +02:00
CHANGELOG.md Update CHANGELOG and release version for v0.16.2 2021-05-22 14:57:30 -07:00
COPYRIGHT
LICENSE-APACHE
LICENSE-MIT
NOTICE
pyproject.toml coverage: move config from .coveragerc to pyproject.toml 2021-07-09 22:49:47 -07:00
pytest.ini Filter UserWarning out of test output (#26001) 2021-09-16 14:56:00 -06:00
README.md Build container images on Github Actions and push to multiple registries (#26247) 2021-09-30 23:34:47 +02:00
SECURITY.md Create SECURITY.md 2021-09-19 06:43:14 -07:00

Spack Spack

Unit Tests Bootstrapping macOS Builds (nightly) codecov Containers Read the Docs Slack

Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, and many supercomputers. Spack is non-destructive: installing a new version of a package does not break existing installations, so many configurations of the same package can coexist.

Spack offers a simple "spec" syntax that allows users to specify versions and configuration options. Package files are written in pure Python, and specs allow package authors to write a single script for many different builds of the same package. With Spack, you can build your software all the ways you want to.

See the Feature Overview for examples and highlights.

To install spack and your first package, make sure you have Python. Then:

$ git clone -c feature.manyFiles=true https://github.com/spack/spack.git
$ cd spack/bin
$ ./spack install zlib

Documentation

Full documentation is available, or run spack help or spack help --all.

For a cheat sheet on Spack syntax, run spack help --spec.

Tutorial

We maintain a hands-on tutorial. It covers basic to advanced usage, packaging, developer features, and large HPC deployments. You can do all of the exercises on your own laptop using a Docker container.

Feel free to use these materials to teach users at your organization about Spack.

Community

Spack is an open source project. Questions, discussion, and contributions are welcome. Contributions can be anything from new packages to bugfixes, documentation, or even new core features.

Resources:

Contributing

Contributing to Spack is relatively easy. Just send us a pull request. When you send your request, make develop the destination branch on the Spack repository.

Your PR must pass Spack's unit tests and documentation tests, and must be PEP 8 compliant. We enforce these guidelines with our CI process. To run these tests locally, and for helpful tips on git, see our Contribution Guide.

Spack's develop branch has the latest contributions. Pull requests should target develop, and users who want the latest package versions, features, etc. can use develop.

Releases

For multi-user site deployments or other use cases that need very stable software installations, we recommend using Spack's stable releases.

Each Spack release series also has a corresponding branch, e.g. releases/v0.14 has 0.14.x versions of Spack, and releases/v0.13 has 0.13.x versions. We backport important bug fixes to these branches but we do not advance the package versions or make other changes that would change the way Spack concretizes dependencies within a release branch. So, you can base your Spack deployment on a release branch and git pull to get fixes, without the package churn that comes with develop.

The latest release is always available with the releases/latest tag.

See the docs on releases for more details.

Code of Conduct

Please note that Spack has a Code of Conduct. By participating in the Spack community, you agree to abide by its rules.

Authors

Many thanks go to Spack's contributors.

Spack was created by Todd Gamblin, tgamblin@llnl.gov.

Citing Spack

If you are referencing Spack in a publication, please cite the following paper:

License

Spack is distributed under the terms of both the MIT license and the Apache License (Version 2.0). Users may choose either license, at their option.

All new contributions must be made under both the MIT and Apache-2.0 licenses.

See LICENSE-MIT, LICENSE-APACHE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (Apache-2.0 OR MIT)

LLNL-CODE-811652