Commit Graph

58 Commits

Author SHA1 Message Date
Todd Gamblin
3118647802
Update copyright year to 2024 (#41919)
It was time to run `spack license update-copyright-year` again.
2024-01-02 09:21:30 +01:00
Massimiliano Culpo
a236fce31f
Partial removal of circular dependencies between spack and llnl (#40090)
Modifications:
- [x] Move `spack.util.string` to `llnl.string`
- [x] Remove dependency of `llnl` on `spack.error`
- [x] Move path of `spack.util.path` to `llnl.path`
- [x] Move `spack.util.environment.get_host_*` to `spack.spec`
2023-09-28 16:21:52 +00:00
Harmen Stoppels
86f9d3865b
Fix broken inode assertion (#39188) 2023-08-08 09:21:23 +02:00
Massimiliano Culpo
50b90e430d
spack.util.lock: add type-hints, remove **kwargs in method signatures (#39011) 2023-07-20 09:41:23 +02:00
Massimiliano Culpo
f34c93c5f8
llnl.util.lock: add type-hints (#38977)
Also uppercase global variables in the module
2023-07-19 11:23:08 +02:00
Adam J. Stewart
45838cee0b
Drop Python 2 super syntax (#38718) 2023-07-05 09:04:29 -05:00
Adam J. Stewart
95847a0b37
Drop Python 2 object subclassing (#38720) 2023-07-05 14:37:44 +02:00
Harmen Stoppels
fce95e2efb
license year bump (#34921)
* license bump year
* fix black issues of modified files
* mypy
* fix 2021 -> 2023
2023-01-18 14:30:17 -08:00
Harmen Stoppels
2167cbf72c
Track locks by (dev, ino); close file handlers between tests (#34122) 2022-11-25 10:57:33 +01:00
Harmen Stoppels
d67b12eb79
locks: improved errors (#33477)
Instead of showing

```
==> Error: Timed out waiting for a write lock.
```

show

```
==> Error: Timed out waiting for a write lock after 1.200ms and 4 attempts on file: /some/file
```

s.t. we actually get to see where acquiring a lock failed even when not
running in debug mode.

And use pretty time units everywhere, so we don't get 1.45e-9 seconds
but 1.450ns etc.
2022-10-24 11:54:49 +02:00
Seth R. Johnson
c7292aa4b6
Fix spack locking on some NFS systems (#32426)
Co-authored-by: Todd Gamblin <tgamblin@llnl.gov>
2022-09-06 09:50:59 -07:00
Todd Gamblin
f52f6e99db black: reformat entire repository with black 2022-07-31 13:29:20 -07:00
John Parent
4aee27816e Windows Support: Testing Suite integration
Broaden support for execution of the test suite
on Windows.
General bug and review fixups
2022-03-17 09:01:01 -07:00
John Parent
cf1349ba35 "spack commands --update-completion" 2022-03-17 09:01:01 -07:00
Todd Gamblin
93377942d1 Update copyright year to 2022 2022-01-14 22:50:21 -08:00
Todd Gamblin
1374fea5d9
locks: only open lockfiles once instead of for every lock held (#24794)
This adds lockfile tracking to Spack's lock mechanism, so that we ensure that there
is only one open file descriptor per inode.

The `fcntl` locks that Spack uses are associated with an inode and a process.
This is convenient, because if a process exits, it releases its locks.
Unfortunately, this also means that if you close a file, *all* locks associated
with that file's inode are released, regardless of whether the process has any
other open file descriptors on it.

Because of this, we need to track open lock files so that we only close them when
a process no longer needs them.  We do this by tracking each lockfile by its
inode and process id.  This has several nice properties:

1. Tracking by pid ensures that, if we fork, we don't inadvertently track the parent
   process's lockfiles. `fcntl` locks are not inherited across forks, so we'll
   just track new lockfiles in the child.
2. Tracking by inode ensures that referencs are counted per inode, and that we don't
   inadvertently close a file whose inode still has open locks.
3. Tracking by both pid and inode ensures that we only open lockfiles the minimum
   number of times necessary for the locks we have.

Note: as mentioned elsewhere, these locks aren't thread safe -- they're designed to
work in Python and assume the GIL.

Tasks:
- [x] Introduce an `OpenFileTracker` class to track open file descriptors by inode.
- [x] Reference-count open file descriptors and only close them if they're no longer
      needed (this avoids inadvertently releasing locks that should not be released).
2021-08-24 14:08:34 -07:00
Adam J. Stewart
b8afc0fd29 API Docs: fix broken reference targets 2021-07-16 08:30:56 -07:00
Todd Gamblin
24c01d57cf
imports: sort imports everywhere in Spack (#24695)
* fix remaining flake8 errors

* imports: sort imports everywhere in Spack

We enabled import order checking in #23947, but fixing things manually drives
people crazy. This used `spack style --fix --all` from #24071 to automatically
sort everything in Spack so PR submitters won't have to deal with it.

This should go in after #24071, as it assumes we're using `isort`, not
`flake8-import-order` to order things. `isort` seems to be more flexible and
allows `llnl` mports to be in their own group before `spack` ones, so this
seems like a good switch.
2021-07-08 22:12:30 +00:00
vsoch
613348ec90 Use gethostname() instead of getfqdn() for lock debug mode
In debug mode, processes taking an exclusive lock write out their node name to
the lock file. We were using `getfqdn()` for this, but it seems to produce
inconsistent results when used from within some github actions containers.

We get this error because getfqdn() seems to return a short name in one place
and a fully qualified name in another:

```
  File "/home/runner/work/spack/spack/lib/spack/spack/test/llnl/util/lock.py", line 1211, in p1
    assert lock.host == self.host
AssertionError: assert 'fv-az290-764....cloudapp.net' == 'fv-az290-764'
  - fv-az290-764.internal.cloudapp.net
  + fv-az290-764
!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!
== 1 failed, 2547 passed, 7 skipped, 22 xfailed, 2 xpassed in 1238.67 seconds ==
```

This seems to stem from https://bugs.python.org/issue5004.

We don't really need to get a fully qualified hostname for debugging, so use
`gethostname()` because its results are more consistent. This seems to fix the
issue.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
2021-04-15 00:01:41 -07:00
Todd Gamblin
a8ccb8e116 copyrights: update all files with license headers for 2021
- [x] add `concretize.lp`, `spack.yaml`, etc. to licensed files
- [x] update all licensed files to say 2013-2021 using
      `spack license update-copyright-year`
- [x] appease mypy with some additions to package.py that needed
      for oneapi.py
2021-01-02 12:12:00 -08:00
Tamara Dahlgren
605c1a76e0
Reduce output verbosity with debug levels (#17546)
* switch from bool to int debug levels

* Added debug options and changed lock logging to use more detailed values

* Limit installer and timestamp PIDs to standard debug output

* Reduced verbosity of fetch/stage/install output, changing most to debug level 1

* Combine lock log methods; change build process install to debug

* Changed binary cache install messages to extraction messages
2020-07-23 00:49:57 -07:00
Tamara Dahlgren
f2aca86502
Distributed builds (#13100)
Fixes #9394
Closes #13217.

## Background
Spack provides the ability to enable/disable parallel builds through two options: package `parallel` and configuration `build_jobs`.  This PR changes the algorithm to allow multiple, simultaneous processes to coordinate the installation of the same spec (and specs with overlapping dependencies.).

The `parallel` (boolean) property sets the default for its package though the value can be overridden in the `install` method.

Spack's current parallel builds are limited to build tools supporting `jobs` arguments (e.g., `Makefiles`).  The number of jobs actually used is calculated as`min(config:build_jobs, # cores, 16)`, which can be overridden in the package or on the command line (i.e., `spack install -j <# jobs>`).

This PR adds support for distributed (single- and multi-node) parallel builds.  The goals of this work include improving the efficiency of installing packages with many dependencies and reducing the repetition associated with concurrent installations of (dependency) packages.

## Approach
### File System Locks
Coordination between concurrent installs of overlapping packages to a Spack instance is accomplished through bottom-up dependency DAG processing and file system locks.  The runs can be a combination of interactive and batch processes affecting the same file system.  Exclusive prefix locks are required to install a package while shared prefix locks are required to check if the package is installed.

Failures are communicated through a separate exclusive prefix failure lock, for concurrent processes, combined with a persistent store, for separate, related build processes.  The resulting file contains the failing spec to facilitate manual debugging.

### Priority Queue
Management of dependency builds changed from reliance on recursion to use of a priority queue where the priority of a spec is based on the number of its remaining uninstalled dependencies.  

Using a queue required a change to dependency build exception handling with the most visible issue being that the `install` method *must* install something in the prefix.  Consequently, packages can no longer get away with an install method consisting of `pass`, for example.

## Caveats
- This still only parallelizes a single-rooted build.  Multi-rooted installs (e.g., for environments) are TBD in a future PR.

Tasks:
- [x] Adjust package lock timeout to correspond to value used in the demo
- [x] Adjust database lock timeout to reduce contention on startup of concurrent
    `spack install <spec>` calls
- [x] Replace (test) package's `install: pass` methods with file creation since post-install 
    `sanity_check_prefix` will otherwise error out with `Install failed .. Nothing was installed!`
- [x] Resolve remaining existing test failures
- [x] Respond to alalazo's initial feedback
- [x] Remove `bin/demo-locks.py`
- [x] Add new tests to address new coverage issues
- [x] Replace built-in package's `def install(..): pass` to "install" something
    (i.e., only `apple-libunwind`)
- [x] Increase code coverage
2020-02-19 00:04:22 -08:00
Todd Gamblin
4af6303086
copyright: update copyright dates for 2020 (#14328) 2019-12-30 22:36:56 -08:00
Todd Gamblin
6c9467e8c6 lock transactions: avoid redundant reading in write transactions
Our `LockTransaction` class was reading overly aggressively.  In cases
like this:

```
1  with spack.store.db.read_transaction():
2    with spack.store.db.write_transaction():
3      ...
```

The `ReadTransaction` on line 1 would read in the DB, but the
WriteTransaction on line 2 would read in the DB *again*, even though we
had a read lock the whole time.  `WriteTransaction`s were only
considering nested writes to decide when to read, but they didn't know
when we already had a read lock.

- [x] `Lock.acquire_write()` return `False` in cases where we already had
       a read lock.
2019-12-23 18:36:56 -08:00
Todd Gamblin
bb517fdb84 lock transactions: ensure that nested write transactions write
If a write transaction was nested inside a read transaction, it would not
write properly on release, e.g., in a sequence like this, inside our
`LockTransaction` class:

```
1  with spack.store.db.read_transaction():
2    with spack.store.db.write_transaction():
3      ...
4  with spack.store.db.read_transaction():
   ...
```

The WriteTransaction on line 2 had no way of knowing that its
`__exit__()` call was the last *write* in the nesting, and it would skip
calling its write function.

The `__exit__()` call of the `ReadTransaction` on line 1 wouldn't know
how to write, and the file would never be written.

The DB would be correct in memory, but the `ReadTransaction` on line 4
would re-read the whole DB assuming that other processes may have
modified it.  Since the DB was never written, we got stale data.

- [x] Make `Lock.release_write()` return `True` whenever we release the
      *last write* in a nest.
2019-12-23 18:36:56 -08:00
Todd Gamblin
eb8fc4f3be lock transactions: fix non-transactional writes
Lock transactions were actually writing *after* the lock was
released. The code was looking at the result of `release_write()` before
writing, then writing based on whether the lock was released.  This is
pretty obviously wrong.

- [x] Refactor `Lock` so that a release function can be passed to the
      `Lock` and called *only* when a lock is really released.

- [x] Refactor `LockTransaction` classes to use the release function
  instead of checking the return value of `release_read()` / `release_write()`
2019-12-23 18:36:56 -08:00
Todd Gamblin
6f50cd52ed copyright: update license headers for 2013-2019 copyright. 2019-01-01 00:44:28 -08:00
Todd Gamblin
eea786f4e8 relicense: replace LGPL headers with Apache-2.0/MIT SPDX headers
- remove the old LGPL license headers from all files in Spack
- add SPDX headers to all files
  - core and most packages are (Apache-2.0 OR MIT)
  - a very small number of remaining packages are LGPL-2.1-only
2018-10-17 14:42:06 -07:00
Peter Scheibel
28c0dd9148
Increase and customize lock timeouts (#9219)
Fixes #9166

This is intended to reduce errors related to lock timeouts by making
the following changes:

* Improves error reporting when acquiring a lock fails (addressing
  #9166) - there is no longer an attempt to release the lock if an
  acquire fails
* By default locks taken on individual packages no longer have a
  timeout. This allows multiple spack instances to install overlapping
  dependency DAGs. For debugging purposes, a timeout can be added by
  setting 'package_lock_timeout' in config.yaml
* Reduces the polling frequency when trying to acquire a lock, to
  reduce impact in the case where NFS is overtaxed. A simple
  adaptive strategy is implemented, which starts with a polling
  interval of .1 seconds and quickly increases to .5 seconds
  (originally it would poll up to 10^5 times per second).
  A test is added to check the polling interval generation logic.
* The timeout for Spack's whole-database lock (e.g. for managing
  information about installed packages) is increased from 60s to
  120s
* Users can configure the whole-database lock timeout using the
  'db_lock_timout' setting in config.yaml

Generally, Spack locks (those created using spack.llnl.util.lock.Lock)
now have no timeout by default

This does not address implementations of NFS that do not support file
locking, or detect cases where services that may be required
(nfslock/statd) aren't running.

Users may want to be able to more-aggressively release locks when
they know they are the only one using their Spack instance, and they
encounter lock errors after a crash (e.g. a remote terminal disconnect
mentioned in #8915).
2018-09-25 18:58:51 -07:00
Todd Gamblin
2b0d944341 locks: fix bug when creating lockfiles in the current directory.
- Fixes a bug in `llnl.util.lock`

- Locks in the current directory would fail because the parent directory
  was the empty string.

- Fix this and return '.' for the parent of locks in the current
  directory.
2018-07-21 10:39:47 -07:00
Todd Gamblin
650786c812 locks: improve errors and permission checking
- Clean up error messages for when a lock can't be created, or when an
  exclusive (write) lock can't be taken on a file.

- Add a number of subclasses of LockError to distinguish timeouts from
  permission issues.

- Add an explicit check to prevent the user from taking a write lock on a
  read-only file.
  - We had a check for this for when we try to *upgrade* a lock on an RO
    file, but not for an initial write lock attempt.

- Add more tests for different lock permission scenarios.
2018-07-12 19:59:53 +02:00
Todd Gamblin
ab794fa741 locks: llnl.util.lock now only writes host info when in debug mode
- write locks previously wrote information about the lock holder (host
  and pid), and read locks woudl read this in.

- This is really only for debugging, so only enable it then

- add some tests that target debug info, and improve multiproc lock test
  output
2018-07-12 19:59:53 +02:00
Todd Gamblin
54201e3c02
locks: add configuration and command-line options to enable/disable locks (#7692)
- spack.util.lock behaves the same as llnl.util.lock, but Lock._lock and
  Lock._unlock do nothing.

- can be disabled with a control variable.

- configuration options can enable/disable locking:
  - `locks` option in spack configuration controls whether Spack will use filesystem locks or not.
  - `-l` and `-L` command-line options can force-disable or force-enable locking.

- Spack will check for group- and world-writability before disabling
  locks, and it will not allow a group- or world-writable instance to
  have locks disabled.

- update documentation
2018-05-18 14:41:03 -07:00
Todd Gamblin
54f97d1dec
Update copyright on LLNL files for 2018. (#7592) 2018-03-24 12:13:52 -07:00
Todd Gamblin
05fa302655
Replace github.com/llnl/spack with github.com/spack/spack (#6142)
We moved to a new GitHub org! Now make the code and docs reflect that.
2017-11-04 17:08:04 -07:00
Michael Kuhn
84ae7872d3 Update copyright notices for 2017 (#5295) 2017-09-06 17:44:16 -10:00
Todd Gamblin
b4d1654e68 Parametrized lock test and make it work with MPI
- Lock test can be run either as a node-local test or as an MPI test.

- Lock test is now parametrized by filesystem, so you can test the
  locking capabilities of your NFS, Lustre, or GPFS filesystem.  See docs
  for details.
2017-07-04 11:41:37 -07:00
Todd Gamblin
cac4362f64 Make LICENSE recognizable by GitHub. (#4598) 2017-06-24 22:22:55 -07:00
Adam J. Stewart
eaa50d3b7c Add API Docs for lib/spack/llnl (#3982)
* Add API Docs for lib/spack/llnl
* Clean up after previous builds
* Better fix for purging API docs
2017-04-25 22:24:02 -07:00
Todd Gamblin
3d8d8d3644 Fix bug with lock upgrades.
- Closing and re-opening to upgrade to write will lose all existing read
  locks on this process.
  - If we didn't allow ranges, sleeping until no reads would work.
  - With ranges, we may never be able to take some legal write locks
    without invalidating all reads. e.g., if a write lock has distinct
    range from all reads, it should just work, but we'd have to close the
    file, reopen, and re-take reads.

- It's easier to just check whether the file is writable in the first
  place and open for writing from the start.

- Lock now only opens files read-only if we *can't* write them.
2016-10-11 01:55:33 -07:00
Todd Gamblin
da6bbfb2d4 Add byte-range parameters to llnl.util.lock 2016-10-11 01:55:32 -07:00
Todd Gamblin
ea10e3bab0 Remove need to touch lock files before using.
- Locks will now create enclosing directories and touch the lock file
  automatically.
2016-10-11 01:55:32 -07:00
Todd Gamblin
907fe912ef Make llnl.util.lock use file objects instead of low-level OS fds.
- Make sure we write, truncate, flush when setting PID and owning host in
  the file.
2016-10-11 01:55:32 -07:00
alalazo
34fe51a4aa install : finer graned locking for install command 2016-10-11 01:38:27 -07:00
Todd Gamblin
bff1656a1a Read-only locks should close fd before opening for write. (#1906)
- Fixes bad file descriptor error in lock acquire, #1904
- Fix bug introduced in previous PR #1857
- Backported fix from soon-to-be merged fine-grained DB locking branch.
2016-10-04 15:36:37 -07:00
Michael Kuhn
8d1ec0df3d Fix read locks on read-only file systems (#1857) 2016-09-30 09:45:08 -07:00
Todd Gamblin
bf1072c902 Make Spack core PEP8 compliant. 2016-08-10 16:33:37 -07:00
Todd Gamblin
9d4a36a62f Properly re-raise exceptions from lock context handler. 2016-08-09 02:25:09 -07:00
Todd Gamblin
0c75c13cc0 Flake8 fixes 2016-08-09 02:25:07 -07:00
Todd Gamblin
102ac7bcf1 Move provider cache to home directory and refactor Transactions
Major stuff:

- Created a FileCache for managing user cache files in Spack.  Currently just
  handles virtuals.

- Moved virtual cache from the repository to the home directory so that users do
  not need write access to Spack repositories to use them.

- Refactored `Transaction` class in `database.py` -- moved it to
  `LockTransaction` in `lock.py` and made it reusable by other classes.

Other additions:

- Added tests for file cache and transactions.

- Added a few more tests for database

- Fixed bug in DB where writes could happen even if exceptions were raised
  during a transaction.

- `spack uninstall` now attempts to repair the database when it discovers that a
  prefix doesn't exist but a DB record does.
2016-08-09 00:24:54 -07:00