python: always use a venv (#40773)

This commit adds a layer of indirection to improve build isolation with 
and without external Python, as well as usability of environment views.

It adds `python-venv` as a dependency to all packages that `extends("python")`, 
which has the following advantages:

1. Build isolation: only `PYTHONPATH` is considered in builds, not 
   user / system packages
2. Stable install layout: fixes the problem on Debian, RHEL and Fedora where 
   external / system python produces `bin/local` subdirs in Spack install prefixes. 
3. Environment views are Python virtual environments (and if you add 
   `py-pip` things like `pip list` work)

Views work whether they're symlink, hardlink or copy type.

This commit additionally makes `spec["python"].command` return 
`spec["python-venv"].command`. The rationale is that packages in repos we do 
not own do not pass the underlying python to the build system, which could still 
result in incorrectly computed install layouts.

Other attributes like `libs`, `headers` should be on `python` anyways and need no change.
This commit is contained in:
Harmen Stoppels
2024-05-06 16:17:35 +02:00
committed by GitHub
parent a081b875b4
commit 125206d44d
55 changed files with 430 additions and 497 deletions

View File

@@ -865,7 +865,7 @@ There are several different ways to use Spack packages once you have
installed them. As you've seen, spack packages are installed into long
paths with hashes, and you need a way to get them into your path. The
easiest way is to use :ref:`spack load <cmd-spack-load>`, which is
described in the next section.
described in this section.
Some more advanced ways to use Spack packages include:
@@ -959,7 +959,86 @@ use ``spack find --loaded``.
You can also use ``spack load --list`` to get the same output, but it
does not have the full set of query options that ``spack find`` offers.
We'll learn more about Spack's spec syntax in the next section.
We'll learn more about Spack's spec syntax in :ref:`a later section <sec-specs>`.
.. _extensions:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Python packages and virtual environments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Spack can install a large number of Python packages. Their names are
typically prefixed with ``py-``. Installing and using them is no
different from any other package:
.. code-block:: console
$ spack install py-numpy
$ spack load py-numpy
$ python3
>>> import numpy
The ``spack load`` command sets the ``PATH`` variable so that the right Python
executable is used, and makes sure that ``numpy`` and its dependencies can be
located in the ``PYTHONPATH``.
Spack is different from other Python package managers in that it installs
every package into its *own* prefix. This is in contrast to ``pip``, which
installs all packages into the same prefix, be it in a virtual environment
or not.
For many users, **virtual environments** are more convenient than repeated
``spack load`` commands, particularly when working with multiple Python
packages. Fortunately Spack supports environments itself, which together
with a view are no different from Python virtual environments.
The recommended way of working with Python extensions such as ``py-numpy``
is through :ref:`Environments <environments>`. The following example creates
a Spack environment with ``numpy`` in the current working directory. It also
puts a filesystem view in ``./view``, which is a more traditional combined
prefix for all packages in the environment.
.. code-block:: console
$ spack env create --with-view view --dir .
$ spack -e . add py-numpy
$ spack -e . concretize
$ spack -e . install
Now you can activate the environment and start using the packages:
.. code-block:: console
$ spack env activate .
$ python3
>>> import numpy
The environment view is also a virtual environment, which is useful if you are
sharing the environment with others who are unfamiliar with Spack. They can
either use the Python executable directly:
.. code-block:: console
$ ./view/bin/python3
>>> import numpy
or use the activation script:
.. code-block:: console
$ source ./view/bin/activate
$ python3
>>> import numpy
In general, there should not be much difference between ``spack env activate``
and using the virtual environment. The main advantage of ``spack env activate``
is that it knows about more packages than just Python packages, and it may set
additional runtime variables that are not covered by the virtual environment
activation script.
See :ref:`environments` for a more in-depth description of Spack
environments and customizations to views.
.. _sec-specs:
@@ -1705,165 +1784,6 @@ check only local packages (as opposed to those used transparently from
``upstream`` spack instances) and the ``-j,--json`` option to output
machine-readable json data for any errors.
.. _extensions:
---------------------------
Extensions & Python support
---------------------------
Spack's installation model assumes that each package will live in its
own install prefix. However, certain packages are typically installed
*within* the directory hierarchy of other packages. For example,
`Python <https://www.python.org>`_ packages are typically installed in the
``$prefix/lib/python-2.7/site-packages`` directory.
In Spack, installation prefixes are immutable, so this type of installation
is not directly supported. However, it is possible to create views that
allow you to merge install prefixes of multiple packages into a single new prefix.
Views are a convenient way to get a more traditional filesystem structure.
Using *extensions*, you can ensure that Python packages always share the
same prefix in the view as Python itself. Suppose you have
Python installed like so:
.. code-block:: console
$ spack find python
==> 1 installed packages.
-- linux-debian7-x86_64 / gcc@4.4.7 --------------------------------
python@2.7.8
.. _cmd-spack-extensions:
^^^^^^^^^^^^^^^^^^^^
``spack extensions``
^^^^^^^^^^^^^^^^^^^^
You can find extensions for your Python installation like this:
.. code-block:: console
$ spack extensions python
==> python@2.7.8%gcc@4.4.7 arch=linux-debian7-x86_64-703c7a96
==> 36 extensions:
geos py-ipython py-pexpect py-pyside py-sip
py-basemap py-libxml2 py-pil py-pytz py-six
py-biopython py-mako py-pmw py-rpy2 py-sympy
py-cython py-matplotlib py-pychecker py-scientificpython py-virtualenv
py-dateutil py-mpi4py py-pygments py-scikit-learn
py-epydoc py-mx py-pylint py-scipy
py-gnuplot py-nose py-pyparsing py-setuptools
py-h5py py-numpy py-pyqt py-shiboken
==> 12 installed:
-- linux-debian7-x86_64 / gcc@4.4.7 --------------------------------
py-dateutil@2.4.0 py-nose@1.3.4 py-pyside@1.2.2
py-dateutil@2.4.0 py-numpy@1.9.1 py-pytz@2014.10
py-ipython@2.3.1 py-pygments@2.0.1 py-setuptools@11.3.1
py-matplotlib@1.4.2 py-pyparsing@2.0.3 py-six@1.9.0
The extensions are a subset of what's returned by ``spack list``, and
they are packages like any other. They are installed into their own
prefixes, and you can see this with ``spack find --paths``:
.. code-block:: console
$ spack find --paths py-numpy
==> 1 installed packages.
-- linux-debian7-x86_64 / gcc@4.4.7 --------------------------------
py-numpy@1.9.1 ~/spack/opt/linux-debian7-x86_64/gcc@4.4.7/py-numpy@1.9.1-66733244
However, even though this package is installed, you cannot use it
directly when you run ``python``:
.. code-block:: console
$ spack load python
$ python
Python 2.7.8 (default, Feb 17 2015, 01:35:25)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named numpy
>>>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using Extensions in Environments
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The recommended way of working with extensions such as ``py-numpy``
above is through :ref:`Environments <environments>`. For example,
the following creates an environment in the current working directory
with a filesystem view in the ``./view`` directory:
.. code-block:: console
$ spack env create --with-view view --dir .
$ spack -e . add py-numpy
$ spack -e . concretize
$ spack -e . install
We recommend environments for two reasons. Firstly, environments
can be activated (requires :ref:`shell-support`):
.. code-block:: console
$ spack env activate .
which sets all the right environment variables such as ``PATH`` and
``PYTHONPATH``. This ensures that
.. code-block:: console
$ python
>>> import numpy
works. Secondly, even without shell support, the view ensures
that Python can locate its extensions:
.. code-block:: console
$ ./view/bin/python
>>> import numpy
See :ref:`environments` for a more in-depth description of Spack
environments and customizations to views.
^^^^^^^^^^^^^^^^^^^^
Using ``spack load``
^^^^^^^^^^^^^^^^^^^^
A more traditional way of using Spack and extensions is ``spack load``
(requires :ref:`shell-support`). This will add the extension to ``PYTHONPATH``
in your current shell, and Python itself will be available in the ``PATH``:
.. code-block:: console
$ spack load py-numpy
$ python
>>> import numpy
The loaded packages can be checked using ``spack find --loaded``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Loading Extensions via Modules
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Apart from ``spack env activate`` and ``spack load``, you can load numpy
through your environment modules (using ``environment-modules`` or
``lmod``). This will also add the extension to the ``PYTHONPATH`` in
your current shell.
.. code-block:: console
$ module load <name of numpy module>
If you do not know the name of the specific numpy module you wish to
load, you can use the ``spack module tcl|lmod loads`` command to get
the name of the module from the Spack spec.
-----------------------
Filesystem requirements
-----------------------

View File

@@ -718,23 +718,45 @@ command-line tool, or C/C++/Fortran program with optional Python
modules? The former should be prepended with ``py-``, while the
latter should not.
""""""""""""""""""""""
extends vs. depends_on
""""""""""""""""""""""
""""""""""""""""""""""""""""""
``extends`` vs. ``depends_on``
""""""""""""""""""""""""""""""
This is very similar to the naming dilemma above, with a slight twist.
As mentioned in the :ref:`Packaging Guide <packaging_extensions>`,
``extends`` and ``depends_on`` are very similar, but ``extends`` ensures
that the extension and extendee share the same prefix in views.
This allows the user to import a Python module without
having to add that module to ``PYTHONPATH``.
When deciding between ``extends`` and ``depends_on``, the best rule of
thumb is to check the installation prefix. If Python libraries are
installed to ``<prefix>/lib/pythonX.Y/site-packages``, then you
should use ``extends``. If Python libraries are installed elsewhere
or the only files that get installed reside in ``<prefix>/bin``, then
don't use ``extends``.
Additionally, ``extends("python")`` adds a dependency on the package
``python-venv``. This improves isolation from the system, whether
it's during the build or at runtime: user and system site packages
cannot accidentally be used by any package that ``extends("python")``.
As a rule of thumb: if a package does not install any Python modules
of its own, and merely puts a Python script in the ``bin`` directory,
then there is no need for ``extends``. If the package installs modules
in the ``site-packages`` directory, it requires ``extends``.
"""""""""""""""""""""""""""""""""""""
Executing ``python`` during the build
"""""""""""""""""""""""""""""""""""""
Whenever you need to execute a Python command or pass the path of the
Python interpreter to the build system, it is best to use the global
variable ``python`` directly. For example:
.. code-block:: python
@run_before("install")
def recythonize(self):
python("setup.py", "clean") # use the `python` global
As mentioned in the previous section, ``extends("python")`` adds an
automatic dependency on ``python-venv``, which is a virtual environment
that guarantees build isolation. The ``python`` global always refers to
the correct Python interpreter, whether the package uses ``extends("python")``
or ``depends_on("python")``.
^^^^^^^^^^^^^^^^^^^^^
Alternatives to Spack