Use gethostname() instead of getfqdn() for lock debug mode

In debug mode, processes taking an exclusive lock write out their node name to
the lock file. We were using `getfqdn()` for this, but it seems to produce
inconsistent results when used from within some github actions containers.

We get this error because getfqdn() seems to return a short name in one place
and a fully qualified name in another:

```
  File "/home/runner/work/spack/spack/lib/spack/spack/test/llnl/util/lock.py", line 1211, in p1
    assert lock.host == self.host
AssertionError: assert 'fv-az290-764....cloudapp.net' == 'fv-az290-764'
  - fv-az290-764.internal.cloudapp.net
  + fv-az290-764
!!!!!!!!!!!!!!!!!!!! Interrupted: stopping after 1 failures !!!!!!!!!!!!!!!!!!!!
== 1 failed, 2547 passed, 7 skipped, 22 xfailed, 2 xpassed in 1238.67 seconds ==
```

This seems to stem from https://bugs.python.org/issue5004.

We don't really need to get a fully qualified hostname for debugging, so use
`gethostname()` because its results are more consistent. This seems to fix the
issue.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
This commit is contained in:
vsoch 2021-04-14 21:23:10 -06:00 committed by Todd Gamblin
parent 393542064d
commit 613348ec90
2 changed files with 2 additions and 2 deletions

View File

@ -264,7 +264,7 @@ def _write_log_debug_data(self):
self.old_host = self.host
self.pid = os.getpid()
self.host = socket.getfqdn()
self.host = socket.gethostname()
# write pid, host to disk to sync over FS
self._file.seek(0)

View File

@ -1192,7 +1192,7 @@ def read():
class LockDebugOutput(object):
def __init__(self, lock_path):
self.lock_path = lock_path
self.host = socket.getfqdn()
self.host = socket.gethostname()
def p1(self, barrier, q1, q2):
# exchange pids