build caches: collect files to relocate while tarballing w/o file (#48212)

A few changes to tarball creation (for build caches):
- do not run file to distinguish binary from text
- file is slow, even when running it in a batched fashion -- it usually reads all bytes and has slow logic to categorize specific types
- we don't need a highly detailed file categorization; a crude categorization of elf, mach-o, text suffices.
detecting elf and mach-o is straightforward and cheap
- detecting utf-8 (and with that ascii) is highly accurate: false positive rate decays exponentially as file size increases. Further it's not only the most common encoding, but the most common file type in package prefixes.
iso-8859-1 is cheaply (but heuristically) detected too, and sufficiently accurate after binaries and utf-8 files are classified earlier
- remove file as a dependency of Spack in general, which makes Spack itself easier to install
- detect file type and need to relocate as part of creating the tarball, which is more cache friendly and thus faster
This commit is contained in:
Harmen Stoppels
2024-12-24 18:53:13 +01:00
committed by GitHub
parent aca469b329
commit e9cdcc4af0
17 changed files with 249 additions and 411 deletions

View File

@@ -140,7 +140,7 @@ jobs:
- name: Install dependencies
run: |
dnf install -y \
bzip2 curl file gcc-c++ gcc gcc-gfortran git gnupg2 gzip \
bzip2 curl gcc-c++ gcc gcc-gfortran git gnupg2 gzip \
make patch tcl unzip which xz
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683
- name: Setup repo and non-root user