![]() TL;DR: there are matching groups trying to match 1 or more occurrences of something. We don't use the matching group. Therefore it's sufficient to test for 1 occurrence. This reduce quadratic complexity to linear time. --- When parsing logs of an mpich build, I'm getting a 4 minute (!!) wait with 16 threads for regexes to run: ``` In [1]: %time p.parse("mpich.log") Wall time: 4min 14s ``` That's really unacceptably slow... After some digging, it seems a few regexes tend to have `O(n^2)` scaling where `n` is the string / log line length. I don't think they *necessarily* should scale like that, but it seems that way. The common pattern is this ``` ([^:]+): error ``` which matches `: error` literally, and then one or more non-colons before that. So for a log line like this: ``` abcdefghijklmnopqrstuvwxyz: error etc etc ``` Any of these are potential group matches when using `search` in Python: ``` abcdefghijklmnopqrstuvwxyz bcdefghijklmnopqrstuvwxyz cdefghijklmnopqrstuvwxyz ⋮ yz z ``` but clearly the capture group should return the longest match. My hypothesis is that Python has a very bad implementation of `search` that somehow considers all of these, even though it can be implemented in linear time by scanning for `: error` first, and then greedily expanding the longest possible `[^:]+` match to the left. If Python indeed considers all possible matches, then with `n` matches of length `1 .. n` you see the `O(n^2)` slowness (i verified this by replacing + with {1,k} and doubling `k`, it doubles the execution time indeed). This PR fixes this by removing the `+`, so effectively changing the O(n^2) into a O(n) worst case. The reason we are fine with dropping `+` is that we don't use the capture group anywhere, so, we just ensure `:: error` is not a match but `x: error` is. After going from O(n^2) to O(n), the 15MB mpich build log is parsed in `1.288s`, so about 200x faster. Just to be sure I've also updated `^CMake Error.*:` to `^CMake Error`, so that it does not match with all the possible `:`'s in the line. Another option is to use `.*?` there to make it quit scanning as soon as possible, but what line that starts with `CMake Error` that does not have a colon is really a false positive... |
||
---|---|---|
.github | ||
bin | ||
etc/spack/defaults | ||
lib/spack | ||
share/spack | ||
var/spack | ||
.codecov.yml | ||
.dockerignore | ||
.flake8 | ||
.gitattributes | ||
.gitignore | ||
.mailmap | ||
.readthedocs.yml | ||
CHANGELOG.md | ||
COPYRIGHT | ||
LICENSE-APACHE | ||
LICENSE-MIT | ||
NOTICE | ||
pyproject.toml | ||
pytest.ini | ||
README.md | ||
SECURITY.md |
Spack
Spack is a multi-platform package manager that builds and installs multiple versions and configurations of software. It works on Linux, macOS, and many supercomputers. Spack is non-destructive: installing a new version of a package does not break existing installations, so many configurations of the same package can coexist.
Spack offers a simple "spec" syntax that allows users to specify versions and configuration options. Package files are written in pure Python, and specs allow package authors to write a single script for many different builds of the same package. With Spack, you can build your software all the ways you want to.
See the Feature Overview for examples and highlights.
To install spack and your first package, make sure you have Python. Then:
$ git clone -c feature.manyFiles=true https://github.com/spack/spack.git
$ cd spack/bin
$ ./spack install zlib
Documentation
Full documentation is available, or
run spack help
or spack help --all
.
For a cheat sheet on Spack syntax, run spack help --spec
.
Tutorial
We maintain a hands-on tutorial. It covers basic to advanced usage, packaging, developer features, and large HPC deployments. You can do all of the exercises on your own laptop using a Docker container.
Feel free to use these materials to teach users at your organization about Spack.
Community
Spack is an open source project. Questions, discussion, and contributions are welcome. Contributions can be anything from new packages to bugfixes, documentation, or even new core features.
Resources:
- Slack workspace: spackpm.slack.com. To get an invitation, visit slack.spack.io.
- Mailing list: groups.google.com/d/forum/spack
- Twitter: @spackpm. Be sure to
@mention
us!
Contributing
Contributing to Spack is relatively easy. Just send us a
pull request.
When you send your request, make develop
the destination branch on the
Spack repository.
Your PR must pass Spack's unit tests and documentation tests, and must be PEP 8 compliant. We enforce these guidelines with our CI process. To run these tests locally, and for helpful tips on git, see our Contribution Guide.
Spack's develop
branch has the latest contributions. Pull requests
should target develop
, and users who want the latest package versions,
features, etc. can use develop
.
Releases
For multi-user site deployments or other use cases that need very stable software installations, we recommend using Spack's stable releases.
Each Spack release series also has a corresponding branch, e.g.
releases/v0.14
has 0.14.x
versions of Spack, and releases/v0.13
has
0.13.x
versions. We backport important bug fixes to these branches but
we do not advance the package versions or make other changes that would
change the way Spack concretizes dependencies within a release branch.
So, you can base your Spack deployment on a release branch and git pull
to get fixes, without the package churn that comes with develop
.
The latest release is always available with the releases/latest
tag.
See the docs on releases for more details.
Code of Conduct
Please note that Spack has a Code of Conduct. By participating in the Spack community, you agree to abide by its rules.
Authors
Many thanks go to Spack's contributors.
Spack was created by Todd Gamblin, tgamblin@llnl.gov.
Citing Spack
If you are referencing Spack in a publication, please cite the following paper:
- Todd Gamblin, Matthew P. LeGendre, Michael R. Collette, Gregory L. Lee, Adam Moody, Bronis R. de Supinski, and W. Scott Futral. The Spack Package Manager: Bringing Order to HPC Software Chaos. In Supercomputing 2015 (SC’15), Austin, Texas, November 15-20 2015. LLNL-CONF-669890.
License
Spack is distributed under the terms of both the MIT license and the Apache License (Version 2.0). Users may choose either license, at their option.
All new contributions must be made under both the MIT and Apache-2.0 licenses.
See LICENSE-MIT, LICENSE-APACHE, COPYRIGHT, and NOTICE for details.
SPDX-License-Identifier: (Apache-2.0 OR MIT)
LLNL-CODE-811652