TL;DR: there are matching groups trying to match 1 or more occurrences of
something. We don't use the matching group. Therefore it's sufficient to test
for 1 occurrence. This reduce quadratic complexity to linear time.
---
When parsing logs of an mpich build, I'm getting a 4 minute (!!) wait
with 16 threads for regexes to run:
```
In [1]: %time p.parse("mpich.log")
Wall time: 4min 14s
```
That's really unacceptably slow...
After some digging, it seems a few regexes tend to have `O(n^2)` scaling
where `n` is the string / log line length. I don't think they *necessarily*
should scale like that, but it seems that way. The common pattern is this
```
([^:]+): error
```
which matches `: error` literally, and then one or more non-colons before that. So
for a log line like this:
```
abcdefghijklmnopqrstuvwxyz: error etc etc
```
Any of these are potential group matches when using `search` in Python:
```
abcdefghijklmnopqrstuvwxyz
bcdefghijklmnopqrstuvwxyz
cdefghijklmnopqrstuvwxyz
⋮
yz
z
```
but clearly the capture group should return the longest match.
My hypothesis is that Python has a very bad implementation of `search`
that somehow considers all of these, even though it can be implemented
in linear time by scanning for `: error` first, and then greedily expanding
the longest possible `[^:]+` match to the left. If Python indeed considers
all possible matches, then with `n` matches of length `1 .. n` you
see the `O(n^2)` slowness (i verified this by replacing + with {1,k}
and doubling `k`, it doubles the execution time indeed).
This PR fixes this by removing the `+`, so effectively changing the
O(n^2) into a O(n) worst case.
The reason we are fine with dropping `+` is that we don't use the
capture group anywhere, so, we just ensure `:: error` is not a match
but `x: error` is.
After going from O(n^2) to O(n), the 15MB mpich build log is parsed
in `1.288s`, so about 200x faster.
Just to be sure I've also updated `^CMake Error.*:` to `^CMake Error`,
so that it does not match with all the possible `:`'s in the line.
Another option is to use `.*?` there to make it quit scanning as soon as
possible, but what line that starts with `CMake Error` that does not have
a colon is really a false positive...
* Use gnuconfig package for config file replacement for RISC-V.
This extends the changes in #26035 to handle RISC-V. Before this change,
many packages fail to configure on riscv64 due to config.guess being too
old to know about RISC-V. This is seen out of the box when clingo fails
to build from source due to pkgconfig failing to configure, throwing
error: "configure: error: cannot guess build type; you must specify one".
* Add riscv64 architecture
* Update vendored archspec from upstream project.
These archspec updates include changes needed to support riscv64.
* Update archspec's __init__.py to reflect the commit hash of archspec being used.
- Match failed autotest tests show the word "FAILED" near the end
- Match "FAIL: ", "FATAL: ", "failed ", "Failed test" of other suites
- autotest " ok"$ means the test passed, independend of text before.
- autoconf messages showing missing tools are fatal later, show them.
Spack is internally using a patched version of `argparse` mainly to backport Python 3 functionality
into Python 2. This PR makes it such that for the supported Python 3 versions we use `argparse`
from the standard Python library. This PR has been extracted from #25371 where it was needed
to be able to use recent versions of `pytest`.
* Fixed formatting issues when using a pristine argparse.py
* Fix error message for Python 3.X when missing positional arguments
* Account for the change of API in Python 3.7
* Layout multi-valued args into columns in error messages
* Seamless transition in develop if argparse.pyc is in external
* Be more defensive in case we can't remove the file.
- [x] add `concretize.lp`, `spack.yaml`, etc. to licensed files
- [x] update all licensed files to say 2013-2021 using
`spack license update-copyright-year`
- [x] appease mypy with some additions to package.py that needed
for oneapi.py
I lost my mind a bit after getting the completion stuff working and
decided to get Mypy working for spack as well. This adds a
`.mypy.ini` that checks all of the spack and llnl modules, though
not yet packages, and fixes all of the identified missing types and
type issues for the spack library.
In addition to these changes, this includes:
* rename `spack flake8` to `spack style`
Aliases flake8 to style, and just runs flake8 as before, but with
a warning. The style command runs both `flake8` and `mypy`,
in sequence. Added --no-<tool> options to turn off one or the
other, they are on by default. Fixed two issues caught by the tools.
* stub typing module for python2.x
We don't support typing in Spack for python 2.x. To allow 2.x to
support `import typing` and `from typing import ...` without a
try/except dance to support old versions, this adds a stub module
*just* for python 2.x. Doing it this way means we can only reliably
use all type hints in python3.7+, and mypi.ini has been updated to
reflect that.
* add non-default black check to spack style
This is a first step to requiring black. It doesn't enforce it by
default, but it will check it if requested. Currently enforcing the
line length of 79 since that's what flake8 requires, but it's a bit odd
for a black formatted project to be quite that narrow. All settings are
in the style command since spack has no pyproject.toml and I don't
want to add one until more discussion happens. Also re-format
`style.py` since it no longer passed the black style check
with the new length.
* use style check in github action
Update the style and docs action to use `spack style`, adding in mypy
and black to the action even if it isn't running black right now.
Users can add test() methods to their packages to run smoke tests on
installations with the new `spack test` command (the old `spack test` is
now `spack unit-test`). spack test is environment-aware, so you can
`spack install` an environment and then run `spack test run` to run smoke
tests on all of its packages. Historical test logs can be perused with
`spack test results`. Generic smoke tests for MPI implementations, C,
C++, and Fortran compilers as well as specific smoke tests for 18
packages.
Inside the test method, individual tests can be run separately (and
continue to run best-effort after a test failure) using the `run_test`
method. The `run_test` method encapsulates finding test executables,
running and checking return codes, checking output, and error handling.
This handles the following trickier aspects of testing with direct
support in Spack's package API:
- [x] Caching source or intermediate build files at build time for
use at test time.
- [x] Test dependencies,
- [x] packages that require a compiler for testing (such as library only
packages).
See the packaging guide for more details on using Spack testing support.
Included is support for package.py files for virtual packages. This does
not change the Spack interface, but is a major change in internals.
Co-authored-by: Tamara Dahlgren <dahlgren1@llnl.gov>
Co-authored-by: wspear <wjspear@gmail.com>
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
If the Python used by Spack does not include Setuptools, then
'spack test' will fail because Spack's vendored pytest dependency
imports and uses Setuptools in some of its functions. It turns out
that Spack doesn't use the functionality those methods enable, so
this PR removes those functions and thereby allows 'spack test' to
run without Setuptools.
* Buildcache: Install into non-default directory layouts
Store a dictionary mapping of original dependency prefixes to dependency hashes
Use the loaded spec to grab the new dependency prefixes in the new directory layout.
Map the original dependency prefixes to the new dependency prefixes using the dependency hashes.
Use the dependency prefixes map to replace original rpaths with new rpaths preserving the order.
For mach-o binaries, use the dependency prefixes map to replace the dependency library entires for libraries and executables and the replace the library id for libraries.
On Linux, patchelf is used to replace the rpaths of elf binaries.
On macOS, install_name_tool is used to replace the rpaths and dependency libraries of mach-o binaries and the id of mach-o libraries.
On Linux, macholib is used to replace the dependency libraries of mach-o binaries and the id of mach-o libraries.
Binary text with padding replacement is attempted for all binaries for the following paths:
spack layout root
spack prefix
sbang script location
dependency prefixes
package prefix
Text replacement is attempted for all text files using the paths above.
Symbolic links to the absolute path of the package install prefix are replaced, all others produce warnings.
This commit removes the `python_version.py` unit test module
and the vendored dependencies `pyqver2.py` and `pyqver3.py`.
It substitutes them with an equivalent check done using
`vermin` that is run as a separate workflow via Github Actions.
This allows us to delete 2 vendored dependencies that are unmaintained
and substitutes them with a maintained tool.
Also, updates the list of vendored dependencies.
Spack doesn't need `requests`, and neither does `jsonschema`, but
`jsonschema` tries to import it, and it'll succeed if `requests` is on
your machine (which is likely, given how popular it is). This commit
removes the import to improve Spack's startup time a bit.
On a mac with SSD, the import of requests is ~28% of Spack's startup time
when run as `spack --print-shell-vars sh,modules` (.069 / .25 seconds),
which is what `setup-env.sh` runs.
On a Linux cluster where Python is mounted from NFS, this reduces
`setup-env.sh` source time from ~1s to .75s.
Note: This issue will be eliminated if we upgrade to a newer `jsonschema`
(we'd need to drop Python 2.6 for that). See
https://github.com/Julian/jsonschema/pull/388.
- remove the old LGPL license headers from all files in Spack
- add SPDX headers to all files
- core and most packages are (Apache-2.0 OR MIT)
- a very small number of remaining packages are LGPL-2.1-only
- pytest was not reporing the correct version from pytest.__version__.
It reported 'unknown'
- this fixes issues on some systems where system-installed pytest plugins
would try to use the version and convert it to an int
- Shorten Spack command usage for short options. Short options are now
shown as [-abc] instead of as [-a] [-b] [-c]
- fix bug that mixed long and short options for top-level `spack help`
* Allow dashes in command names and fix command name handling
- Command should allow dashes in their names like the reest of spack,
e.g. `spack log-parse`
- It might be too late for `spack build-cache` (since it is already
called `spack buildcache`), but we should try a bit to avoid
inconsistencies in naming conventions
- The code was inconsistent about where commands should be called by
their python module name (e.g. `log_parse`) and where the actual
command name should be used (e.g. `log-parse`).
- This made it hard to make a command with a dash in the name, and it
made `SpackCommand` fail to recognize commands with dashes.
- The code now uses the user-facing name with dashes for function
parameters, then converts that the module name when needed.
* Improve performance of log parsing
- A number of regular expressions from ctest_log_parser have really poor
performance, most due to untethered expressions with * or + (i.e., they
don't start with ^, so the repetition has to be checked for every
position in the string with Python's backtracking regex implementation)
- I can't verify that CTest's regexes work with an added ^, so I don't
really want to touch them. I tried adding this and found that it
caused some tests to break.
- Instead of using only "efficient" regular expressions, Added a
prefilter() class that allows the parser to quickly check a
precondition before evaluating any of the expensive regexes.
- Preconditions do things like check whether the string contains "error"
or "warning" (linear time things) before evaluating regexes that would
require them. It's sad that Python doesn't use Thompson string
matching (see https://swtch.com/~rsc/regexp/regexp1.html)
- Even with Python's slow implementation, this makes the parser ~200x
faster on the input we tried it on.
* Add `spack log-parse` command and improve the display of parsed logs
- Add better coloring and line wrapping to the log parse output. This
makes nasty build output look better with the line numbers.
- `spack log-parse` allows the log parsing logic used at the end of
builds to be executed on arbitrary files, which is handy even outside
of spack.
- Also provides a profile option -- we can profile arbitrary files and
show which regular expressions in the magic CTest parser take the most
time.
* Parallelize log parsing
- Log parsing now uses multiple threads for long logs
- Lines from logs are divided into chnks and farmed out to <ncpus>
- Add -j option to `spack log-parse`
* Vendor ordereddict for python2.6 only
This commit substitutes the custom module 'ordereddict_backport' with
the more known 'ordereddict' and vendors it only for python 2.6. Other
supported versions of python will use 'collections.OrderedDict'.
* Use absolute imports also for python 2.6
See PEP-328 for more information on the subject
* Added provenance of vendored ordereddict
* Keep track of source and versions for external libraries
* Note source of more obscure libraries
* We aren't upgrading jsonschema after all
* Add note on modifications made to pytest