Fixes#9166
This is intended to reduce errors related to lock timeouts by making
the following changes:
* Improves error reporting when acquiring a lock fails (addressing
#9166) - there is no longer an attempt to release the lock if an
acquire fails
* By default locks taken on individual packages no longer have a
timeout. This allows multiple spack instances to install overlapping
dependency DAGs. For debugging purposes, a timeout can be added by
setting 'package_lock_timeout' in config.yaml
* Reduces the polling frequency when trying to acquire a lock, to
reduce impact in the case where NFS is overtaxed. A simple
adaptive strategy is implemented, which starts with a polling
interval of .1 seconds and quickly increases to .5 seconds
(originally it would poll up to 10^5 times per second).
A test is added to check the polling interval generation logic.
* The timeout for Spack's whole-database lock (e.g. for managing
information about installed packages) is increased from 60s to
120s
* Users can configure the whole-database lock timeout using the
'db_lock_timout' setting in config.yaml
Generally, Spack locks (those created using spack.llnl.util.lock.Lock)
now have no timeout by default
This does not address implementations of NFS that do not support file
locking, or detect cases where services that may be required
(nfslock/statd) aren't running.
Users may want to be able to more-aggressively release locks when
they know they are the only one using their Spack instance, and they
encounter lock errors after a crash (e.g. a remote terminal disconnect
mentioned in #8915).
Fixes#9001#8289 added support for install_tree and copy_tree to merge into an existing
directory structure. However, it did not properly handle relative symlinks and
also removed support for the 'ignore' keyword. Additionally, some of the tests
were overly-strict when checking the permissions on the copied files.
This updates the install_tree/copy_tree methods and their tests:
* copy_tree/install_tree now preserve relative link targets (if the symlink in the
source directory structure is relative, the symlink created in the destination
will be relative)
* Added support for 'ignore' argument back to copy_tree/install_tree (removed
in #8289). It is no longer the object output by shutil.ignore_patterns: you pass a
function that accepts a path relative to the source and returns whether that
path should be copied.
* The openfoam packages (currently the only ones making use of the 'ignore'
argument) are updated for the new API
* When a symlink target is absolute, copy_tree and install_tree now rewrite the
source prefix to be the destination prefix
* copy_tree tests no longer check permissions: copy_tree doesn't enforce
anything about permissions so its tests don't check for that
* install_tree tests no longer check for exact permission matching since it can add
file permissions
Replace use of `shutil.copytree` with `copy_tree` and `install_tree` functions in `llnl.util.filesystem`.
- `copy_tree` copies without setting permissions. It should be used to copy files around in the build directory.
- `install_tree` copies files and sets permissions. It should be used to copy files into the installation directory.
- `install` and `copy` are analogous single-file functions.
- add more extensive tests for these functions
- update packages to use these functions.
* Fix performance issue when compiling.
Spack was doing active wait when compiling, spoiling one core.
My fix consists in not setting any timeout for select, instead of
the previous 0 second.
* Fix comments about select.select timeout
- This was a nasty workaround due to the way our compiler wrappers used
to work. We don't want to have to modify our elfutils installation to
install libdwarf.
- Since cd9691de5, we no longer need this because the package will always
come before dependencies in our include order.
- repo membership test was broken by the refactor of spack/__init__.py
- refactor singleton so that 'spec in repo' works again for `spack.repo.path`
- fix spec command and add basic tests for `spack spec` and `spack spec --yaml`
- Fixes a bug in `llnl.util.lock`
- Locks in the current directory would fail because the parent directory
was the empty string.
- Fix this and return '.' for the parent of locks in the current
directory.
- Clean up error messages for when a lock can't be created, or when an
exclusive (write) lock can't be taken on a file.
- Add a number of subclasses of LockError to distinguish timeouts from
permission issues.
- Add an explicit check to prevent the user from taking a write lock on a
read-only file.
- We had a check for this for when we try to *upgrade* a lock on an RO
file, but not for an initial write lock attempt.
- Add more tests for different lock permission scenarios.
- write locks previously wrote information about the lock holder (host
and pid), and read locks woudl read this in.
- This is really only for debugging, so only enable it then
- add some tests that target debug info, and improve multiproc lock test
output
Functional updates:
- `python` now creates a copy of the `python` binaries when it is added
to a view
- Python extensions (packages which subclass `PythonPackage`) rewrite
their shebang lines to refer to python in the view
- Python packages in the same namespace will not generate conflicts if
both have `...lib/site-packages/namespace-example/__init__.py`
- These `__init__` files will also remain when removing any package in
the namespace until the last package in the namespace is removed
Generally (Updated 2/16):
- Any package can define `add_files_to_view` to customize how it is added
to a view (and at the moment custom definitions are included for
`python` and `PythonPackage`)
- Likewise any package can define `remove_files_from_view` to customize
which files are removed (e.g. you don't always want to remove the
namespace `__init__`)
- Any package can define `view_file_conflicts` to customize what it
considers a merge conflict
- Global activations are handled like views (where the view root is the
spec prefix of the extendee)
- Benefit: filesystem-management aspects of activating extensions are
now placed in views (e.g. now one can hardlink a global activation)
- Benefit: overriding `Package.activate` is more straightforward (see
`Python.activate`)
- Complication: extension packages which have special-purpose logic
*only* when activated outside of the extendee prefix must check for
this in their `add_files_to_view` method (see `PythonPackage`)
- `LinkTree` is refactored to have separate methods for copying a
directory structure and for copying files (since it was found that
generally packages may want to alter how files are copied but still
wanted to copy directories in the same way)
TODOs (updated 2/20):
- [x] additional testing (there is some unit testing added at this point
but more would be useful)
- [x] refactor or reorganize `LinkTree` methods: currently there is a
separate set of methods for replicating just the directory structure
without the files, and a set for replicating everything
- [x] Right now external views (i.e. those not used for global
activations) call `view.add_extension`, but global activations do not
to avoid some extra work that goes into maintaining external views. I'm
not sure if addressing that needs to be done here but I'd like to
clarify it in the comments (UPDATE: for now I have added a TODO and in
my opinion this can be merged now and the refactor handled later)
- [x] Several method descriptions (e.g. for `Package.activate`) are out
of date and reference a distinction between global activations and
views, they need to be updated
- [x] Update aspell package activations
- Spack was assuming that a group with gid == current uid would always exist.
- This was breaking the travis build for macos.
- also fix issue with the DB tarball test finding coverage filesx
- spack.util.lock behaves the same as llnl.util.lock, but Lock._lock and
Lock._unlock do nothing.
- can be disabled with a control variable.
- configuration options can enable/disable locking:
- `locks` option in spack configuration controls whether Spack will use filesystem locks or not.
- `-l` and `-L` command-line options can force-disable or force-enable locking.
- Spack will check for group- and world-writability before disabling
locks, and it will not allow a group- or world-writable instance to
have locks disabled.
- update documentation
- simplify the singleton pattern across the codebase
- reduce lines of code needed for crufty initialization
- reduce functions that need to mess with a global
- Singletons whose semantics changed:
- spack.store.store() -> spack.store
- spack.repo.path() -> spack.repo.path
- spack.config.config() -> spack.config.config
- spack.caches.fetch_cache() -> spack.caches.fetch_cache
- spack.caches.misc_cache() -> spack.caches.misc_cache
* Added installation date and time to the database
Information on the date and time of installation of a spec is recorded
into the database. The information is retained on reindexing.
* Expose the possibility to query for installation date
The DB can now be queried for specs that have been installed in a given
time window. This query possibility is exposed to command line via two
new options of the `find` command.
* Extended docstring for Database._add
* Use timestamps since the epoch instead of formatted date in the DB
* Allow 'pretty date' formats from command line
* Substituted kwargs with explicit arguments
* Simplified regex for pretty date strings. Added unit tests.
This updates the fix_darwin_install_name function to use the Spack
Executable object to run install_name_tool, which ensures that
process output is formatted as a 'str' for python2 and python3.
Originally fix_darwin_install_name was invoking subprocess.Popen
directly.
Following the discussion with Todd and Adam, find has been modified to
accept glob expressions. This should not affect performance as every
glob implementation I inspected has 3 cases (no wildcard, wildcard but
no directories involved, wildcard and directories involved) and uses
fnmatch underneath.
Mixins have been changed to do by default a non-recursive search (but
a recursive search can still be triggered using the recursive keyword).
Following comments from Todd:
- the call to tty.debug has been moved deeper, to log the filtering of each file
- the shadowing on the name "kwargs" is avoided
- command reference now includes usage for all Spack commands as output
by `spack help`. Each command usage links to any related section in
the docs.
- added `spack commands` command which can list command names,
subcommands, and generate RST docs for commands.
- added `llnl.util.argparsewriter`, which analyzes an argparse parser and
calls hooks for description, usage, options, and subcommands
'spack install' can now reinstall a spec even if it has dependents, via
the --overwrite option. This option moves the current installation in a
temporary directory. If the reinstallation is successful the temporary
is removed, otherwise a rollback is performed.
- When you don't use wildcards, flake8 will find places where you used an
undefined name.
- This commit has all the bugfixes resulting from this static check.
I'm tracking down a problem with the perl package that's been
generating this error:
```
OSError: OSError: [Errno 2] No such file or directory: '/blah/blah/blah/lib/5.24.1/x86_64-linux/Config.pm~'
```
The real problem is upstream, but it's being masked by an exception
raised in `filter_file`s finally block.
In my case, `backup` is `False`.
The backup is created around line 127, the `re.sub()` calls
fails (working on that), the `except` block fires and moves the backup
file back, then the finally block tries to remove the non-existent
backup file.
This change just avoids trying to remove the non-existent file.
`spack blame` prints out the contributors to a package.
By modification time:
```
$ spack blame --time llvm
LAST_COMMIT LINES % AUTHOR EMAIL
3 days ago 2 0.6 Andrey Prokopenko <andrey.prok@gmail.com>
3 weeks ago 125 34.7 Massimiliano Culpo <massimiliano.culpo@epfl.ch>
3 weeks ago 3 0.8 Peter Scheibel <scheibel1@llnl.gov>
2 months ago 21 5.8 Adam J. Stewart <ajstewart426@gmail.com>
2 months ago 1 0.3 Gregory Becker <becker33@llnl.gov>
3 months ago 116 32.2 Todd Gamblin <tgamblin@llnl.gov>
5 months ago 2 0.6 Jimmy Tang <jcftang@gmail.com>
5 months ago 6 1.7 Jean-Paul Pelteret <jppelteret@gmail.com>
7 months ago 65 18.1 Tom Scogland <tscogland@llnl.gov>
11 months ago 13 3.6 Kelly (KT) Thompson <kgt@lanl.gov>
a year ago 1 0.3 Scott Pakin <pakin@lanl.gov>
a year ago 3 0.8 Erik Schnetter <schnetter@gmail.com>
3 years ago 2 0.6 David Beckingsale <davidbeckingsale@gmail.com>
3 days ago 360 100.0
```
Or by percent contribution:
```
$ spack blame --percent llvm
LAST_COMMIT LINES % AUTHOR EMAIL
3 weeks ago 125 34.7 Massimiliano Culpo <massimiliano.culpo@epfl.ch>
3 months ago 116 32.2 Todd Gamblin <tgamblin@llnl.gov>
7 months ago 65 18.1 Tom Scogland <tscogland@llnl.gov>
2 months ago 21 5.8 Adam J. Stewart <ajstewart426@gmail.com>
11 months ago 13 3.6 Kelly (KT) Thompson <kgt@lanl.gov>
5 months ago 6 1.7 Jean-Paul Pelteret <jppelteret@gmail.com>
3 weeks ago 3 0.8 Peter Scheibel <scheibel1@llnl.gov>
a year ago 3 0.8 Erik Schnetter <schnetter@gmail.com>
3 years ago 2 0.6 David Beckingsale <davidbeckingsale@gmail.com>
3 days ago 2 0.6 Andrey Prokopenko <andrey.prok@gmail.com>
5 months ago 2 0.6 Jimmy Tang <jcftang@gmail.com>
2 months ago 1 0.3 Gregory Becker <becker33@llnl.gov>
a year ago 1 0.3 Scott Pakin <pakin@lanl.gov>
3 days ago 360 100.0
```
- Python I/O would not properly interleave (or appear) with output from
subcommands.
- Add a flusing wrapper around sys.stdout and sys.stderr when
redirecting, so that Python output is synchronous with that of
subcommands.
- 'v' toggle was previously only good for the current install.
- subsequent installs needed user to press 'v' again.
- 'v' state is now preserved across dependency installs.
- Simplify interface to log_output. New interface requires only one
context handler instead of two. Before:
with log_output('logfile.txt') as log_redirection:
with log_redirection:
# do things ... output will be logged
After:
with log_output('logfile.txt'):
# do things ... output will be logged
If you also want the output to be echoed to ``stdout``, use the
`echo` parameter::
with log_output('logfile.txt', echo=True):
# do things ... output will be logged and printed out
And, if you just want to echo *some* stuff from the parent, use
``force_echo``:
with log_output('logfile.txt', echo=False) as logger:
# do things ... output will be logged
with logger.force_echo():
# things here will be echoed *and* logged
A key difference between this and the previous implementation is that
*everything* in the context handler is logged. Previously, things like
`Executing phase 'configure'` would not be logged, only output to the
screen, so understanding phases in the build log was difficult.
- The implementation of `log_output()` is different in two major ways:
1. This implementation avoids race cases by using only one pipe (before
we had a multiprocessing pipe and a unix pipe). The logger daemon
stops naturally when the input stream is closed, which avoids a race
in the previous implementation where we'd miss some lines of output
because the parent would shut the daemon down before it was done
with all output.
2. Instead of turning output redirection on and off, which prevented
some things from being logged, this version uses control characters
in the output stream to enable/disable forced echoing. We're using
the time-honored xon and xoff codes, which tell the daemon to echo
anything between them AND write it to the log. This is how
`logger.force_echo()` works.
- Fix places where output could get stuck in buffers by flushing more
aggressively. This makes the output printed to the terminal the same
as that which would be printed through a pipe to `cat` or to a file.
Previously these could be weirdly different, and some output would be
missing when redirecting Spack to a file or pipe.
- Simplify input and color handling in both `build_environment.fork()`
and `llnl.util.tty.log.log_output()`. Neither requires an input_stream
parameter anymore; we assume stdin will be forwarded if possible.
- remove `llnl.util.lang.duplicate_stream()` and remove associated
monkey-patching in tests, as these aren't needed if you just check
whether stdin is a tty and has a fileno attribute.
* Disable spec colorization when redirecting stdout and add command line flag to re-enable
* Add command line `--color` flag to control output colorization
* Add options to `llnl.util.tty.color` to allow color to be auto/always/never
* Add `Spec.cformat()` function to be used when `format()` should have auto-coloring
- Lock test can be run either as a node-local test or as an MPI test.
- Lock test is now parametrized by filesystem, so you can test the
locking capabilities of your NFS, Lustre, or GPFS filesystem. See docs
for details.