Refactor Spack's URL parsing commands (#2938)

* Replace `spack urls` and `spack url-parse` with `spack url`
* Allow spack url list to only list incorrect parsings
* Add spack url test reporting
* Add unit tests for new URL commands
This commit is contained in:
Adam J. Stewart
2017-01-31 10:14:52 -06:00
committed by Todd Gamblin
parent 2e81fe4fb3
commit 123f057089
7 changed files with 796 additions and 311 deletions

View File

@@ -300,6 +300,42 @@ Stage objects
Writing commands
----------------
Adding a new command to Spack is easy. Simply add a ``<name>.py`` file to
``lib/spack/spack/cmd/``, where ``<name>`` is the name of the subcommand.
At the bare minimum, two functions are required in this file:
^^^^^^^^^^^^^^^^^^
``setup_parser()``
^^^^^^^^^^^^^^^^^^
Unless your command doesn't accept any arguments, a ``setup_parser()``
function is required to define what arguments and flags your command takes.
See the `Argparse documentation <https://docs.python.org/2.7/library/argparse.html>`_
for more details on how to add arguments.
Some commands have a set of subcommands, like ``spack compiler find`` or
``spack module refresh``. You can add subparsers to your parser to handle
this. Check out ``spack edit --command compiler`` for an example of this.
A lot of commands take the same arguments and flags. These arguments should
be defined in ``lib/spack/spack/cmd/common/arguments.py`` so that they don't
need to be redefined in multiple commands.
^^^^^^^^^^^^
``<name>()``
^^^^^^^^^^^^
In order to run your command, Spack searches for a function with the same
name as your command in ``<name>.py``. This is the main method for your
command, and can call other helper methods to handle common tasks.
Remember, before adding a new command, think to yourself whether or not this
new command is actually necessary. Sometimes, the functionality you desire
can be added to an existing command. Also remember to add unit tests for
your command. If it isn't used very frequently, changes to the rest of
Spack can cause your command to break without sufficient unit tests to
prevent this from happening.
----------
Unit tests
----------
@@ -312,14 +348,80 @@ Unit testing
Developer commands
------------------
.. _cmd-spack-doc:
^^^^^^^^^^^^^
``spack doc``
^^^^^^^^^^^^^
.. _cmd-spack-test:
^^^^^^^^^^^^^^
``spack test``
^^^^^^^^^^^^^^
.. _cmd-spack-url:
^^^^^^^^^^^^^
``spack url``
^^^^^^^^^^^^^
A package containing a single URL can be used to download several different
versions of the package. If you've ever wondered how this works, all of the
magic is in :mod:`spack.url`. This module contains methods for extracting
the name and version of a package from its URL. The name is used by
``spack create`` to guess the name of the package. By determining the version
from the URL, Spack can replace it with other versions to determine where to
download them from.
The regular expressions in ``parse_name_offset`` and ``parse_version_offset``
are used to extract the name and version, but they aren't perfect. In order
to debug Spack's URL parsing support, the ``spack url`` command can be used.
"""""""""""""""""""
``spack url parse``
"""""""""""""""""""
If you need to debug a single URL, you can use the following command:
.. command-output:: spack url parse http://cache.ruby-lang.org/pub/ruby/2.2/ruby-2.2.0.tar.gz
You'll notice that the name and version of this URL are correctly detected,
and you can even see which regular expressions it was matched to. However,
you'll notice that when it substitutes the version number in, it doesn't
replace the ``2.2`` with ``9.9`` where we would expect ``9.9.9b`` to live.
This particular package may require a ``list_url`` or ``url_for_version``
function.
This command also accepts a ``--spider`` flag. If provided, Spack searches
for other versions of the package and prints the matching URLs.
""""""""""""""""""
``spack url list``
""""""""""""""""""
This command lists every URL in every package in Spack. If given the
``--color`` and ``--extrapolation`` flags, it also colors the part of
the string that it detected to be the name and version. The
``--incorrect-name`` and ``--incorrect-version`` flags can be used to
print URLs that were not being parsed correctly.
""""""""""""""""""
``spack url test``
""""""""""""""""""
This command attempts to parse every URL for every package in Spack
and prints a summary of how many of them are being correctly parsed.
It also prints a histogram showing which regular expressions are being
matched and how frequently:
.. command-output:: spack url test
This command is essential for anyone adding or changing the regular
expressions that parse names and versions. By running this command
before and after the change, you can make sure that your regular
expression fixes more packages than it breaks.
---------
Profiling
---------

View File

@@ -712,8 +712,8 @@ is at ``http://example.com/downloads/foo-1.0.tar.gz``, Spack will look
in ``http://example.com/downloads/`` for links to additional versions.
If you need to search another path for download links, you can supply
some extra attributes that control how your package finds new
versions. See the documentation on `attribute_list_url`_ and
`attribute_list_depth`_.
versions. See the documentation on :ref:`attribute_list_url` and
:ref:`attribute_list_depth`.
.. note::
@@ -728,6 +728,102 @@ versions. See the documentation on `attribute_list_url`_ and
syntax errors, or the ``import`` will fail. Use this once you've
got your package in working order.
--------------------
Finding new versions
--------------------
You've already seen the ``homepage`` and ``url`` package attributes:
.. code-block:: python
:linenos:
from spack import *
class Mpich(Package):
"""MPICH is a high performance and widely portable implementation of
the Message Passing Interface (MPI) standard."""
homepage = "http://www.mpich.org"
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
These are class-level attributes used by Spack to show users
information about the package, and to determine where to download its
source code.
Spack uses the tarball URL to extrapolate where to find other tarballs
of the same package (e.g. in :ref:`cmd-spack-checksum`, but
this does not always work. This section covers ways you can tell
Spack to find tarballs elsewhere.
.. _attribute_list_url:
^^^^^^^^^^^^
``list_url``
^^^^^^^^^^^^
When spack tries to find available versions of packages (e.g. with
:ref:`cmd-spack-checksum`), it spiders the parent directory
of the tarball in the ``url`` attribute. For example, for libelf, the
url is:
.. code-block:: python
url = "http://www.mr511.de/software/libelf-0.8.13.tar.gz"
Here, Spack spiders ``http://www.mr511.de/software/`` to find similar
tarball links and ultimately to make a list of available versions of
``libelf``.
For many packages, the tarball's parent directory may be unlistable,
or it may not contain any links to source code archives. In fact,
many times additional package downloads aren't even available in the
same directory as the download URL.
For these, you can specify a separate ``list_url`` indicating the page
to search for tarballs. For example, ``libdwarf`` has the homepage as
the ``list_url``, because that is where links to old versions are:
.. code-block:: python
:linenos:
class Libdwarf(Package):
homepage = "http://www.prevanders.net/dwarf.html"
url = "http://www.prevanders.net/libdwarf-20130729.tar.gz"
list_url = homepage
.. _attribute_list_depth:
^^^^^^^^^^^^^^
``list_depth``
^^^^^^^^^^^^^^
``libdwarf`` and many other packages have a listing of available
versions on a single webpage, but not all do. For example, ``mpich``
has a tarball URL that looks like this:
.. code-block:: python
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
But its downloads are in many different subdirectories of
``http://www.mpich.org/static/downloads/``. So, we need to add a
``list_url`` *and* a ``list_depth`` attribute:
.. code-block:: python
:linenos:
class Mpich(Package):
homepage = "http://www.mpich.org"
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
list_url = "http://www.mpich.org/static/downloads/"
list_depth = 2
By default, Spack only looks at the top-level page available at
``list_url``. ``list_depth`` tells it to follow up to 2 levels of
links from the top-level page. Note that here, this implies two
levels of subdirectories, as the ``mpich`` website is structured much
like a filesystem. But ``list_depth`` really refers to link depth
when spidering the page.
.. _vcs-fetch:
@@ -1241,103 +1337,6 @@ RPATHs in Spack are handled in one of three ways:
links. You can see this how this is used in the :ref:`PySide
example <pyside-patch>` above.
--------------------
Finding new versions
--------------------
You've already seen the ``homepage`` and ``url`` package attributes:
.. code-block:: python
:linenos:
from spack import *
class Mpich(Package):
"""MPICH is a high performance and widely portable implementation of
the Message Passing Interface (MPI) standard."""
homepage = "http://www.mpich.org"
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
These are class-level attributes used by Spack to show users
information about the package, and to determine where to download its
source code.
Spack uses the tarball URL to extrapolate where to find other tarballs
of the same package (e.g. in :ref:`cmd-spack-checksum`, but
this does not always work. This section covers ways you can tell
Spack to find tarballs elsewhere.
.. _attribute_list_url:
^^^^^^^^^^^^
``list_url``
^^^^^^^^^^^^
When spack tries to find available versions of packages (e.g. with
:ref:`cmd-spack-checksum`), it spiders the parent directory
of the tarball in the ``url`` attribute. For example, for libelf, the
url is:
.. code-block:: python
url = "http://www.mr511.de/software/libelf-0.8.13.tar.gz"
Here, Spack spiders ``http://www.mr511.de/software/`` to find similar
tarball links and ultimately to make a list of available versions of
``libelf``.
For many packages, the tarball's parent directory may be unlistable,
or it may not contain any links to source code archives. In fact,
many times additional package downloads aren't even available in the
same directory as the download URL.
For these, you can specify a separate ``list_url`` indicating the page
to search for tarballs. For example, ``libdwarf`` has the homepage as
the ``list_url``, because that is where links to old versions are:
.. code-block:: python
:linenos:
class Libdwarf(Package):
homepage = "http://www.prevanders.net/dwarf.html"
url = "http://www.prevanders.net/libdwarf-20130729.tar.gz"
list_url = homepage
.. _attribute_list_depth:
^^^^^^^^^^^^^^
``list_depth``
^^^^^^^^^^^^^^
``libdwarf`` and many other packages have a listing of available
versions on a single webpage, but not all do. For example, ``mpich``
has a tarball URL that looks like this:
.. code-block:: python
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
But its downloads are in many different subdirectories of
``http://www.mpich.org/static/downloads/``. So, we need to add a
``list_url`` *and* a ``list_depth`` attribute:
.. code-block:: python
:linenos:
class Mpich(Package):
homepage = "http://www.mpich.org"
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
list_url = "http://www.mpich.org/static/downloads/"
list_depth = 2
By default, Spack only looks at the top-level page available at
``list_url``. ``list_depth`` tells it to follow up to 2 levels of
links from the top-level page. Note that here, this implies two
levels of subdirectories, as the ``mpich`` website is structured much
like a filesystem. But ``list_depth`` really refers to link depth
when spidering the page.
.. _attribute_parallel:
---------------