removing feature bloat: monitor and analyzers (#31130)

Signed-off-by: vsoch <vsoch@users.noreply.github.com>

Co-authored-by: vsoch <vsoch@users.noreply.github.com>
This commit is contained in:
Vanessasaurus
2022-07-07 00:49:40 -06:00
committed by GitHub
parent 3338d536f6
commit 6b1e86aecc
21 changed files with 5 additions and 2192 deletions

View File

@@ -107,7 +107,6 @@ with a high level view of Spack's directory structure:
llnl/ <- some general-use libraries
spack/ <- spack module; contains Python code
analyzers/ <- modules to run analysis on installed packages
build_systems/ <- modules for different build systems
cmd/ <- each file in here is a spack subcommand
compilers/ <- compiler description files
@@ -242,22 +241,6 @@ Unit tests
Implements Spack's test suite. Add a module and put its name in
the test suite in ``__init__.py`` to add more unit tests.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Research and Monitoring Modules
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
:mod:`spack.monitor`
Contains :class:`~spack.monitor.SpackMonitorClient`. This is accessed from
the ``spack install`` and ``spack analyze`` commands to send build and
package metadata up to a `Spack Monitor
<https://github.com/spack/spack-monitor>`_ server.
:mod:`spack.analyzers`
A module folder with a :class:`~spack.analyzers.analyzer_base.AnalyzerBase`
that provides base functions to run, save, and (optionally) upload analysis
results to a `Spack Monitor <https://github.com/spack/spack-monitor>`_ server.
^^^^^^^^^^^^^
Other Modules
@@ -301,240 +284,6 @@ Most spack commands look something like this:
The information in Package files is used at all stages in this
process.
Conceptually, packages are overloaded. They contain:
-------------
Stage objects
-------------
.. _writing-analyzers:
-----------------
Writing analyzers
-----------------
To write an analyzer, you should add a new python file to the
analyzers module directory at ``lib/spack/spack/analyzers`` .
Your analyzer should be a subclass of the :class:`AnalyzerBase <spack.analyzers.analyzer_base.AnalyzerBase>`. For example, if you want
to add an analyzer class ``Myanalyzer`` you would write to
``spack/analyzers/myanalyzer.py`` and import and
use the base as follows:
.. code-block:: python
from .analyzer_base import AnalyzerBase
class Myanalyzer(AnalyzerBase):
Note that the class name is your module file name, all lowercase
except for the first capital letter. You can look at other analyzers in
that analyzer directory for examples. The guide here will tell you about the basic functions needed.
^^^^^^^^^^^^^^^^^^^^^^^^^
Analyzer Output Directory
^^^^^^^^^^^^^^^^^^^^^^^^^
By default, when you run ``spack analyze run`` an analyzer output directory will
be created in your spack user directory in your ``$HOME``. The reason we output here
is because the install directory might not always be writable.
.. code-block:: console
~/.spack/
analyzers
Result files will be written here, organized in subfolders in the same structure
as the package, with each analyzer owning it's own subfolder. for example:
.. code-block:: console
$ tree ~/.spack/analyzers/
/home/spackuser/.spack/analyzers/
└── linux-ubuntu20.04-skylake
└── gcc-9.3.0
└── zlib-1.2.11-sl7m27mzkbejtkrajigj3a3m37ygv4u2
├── environment_variables
│   └── spack-analyzer-environment-variables.json
├── install_files
│   └── spack-analyzer-install-files.json
└── libabigail
└── lib
└── spack-analyzer-libabigail-libz.so.1.2.11.xml
Notice that for the libabigail analyzer, since results are generated per object,
we honor the object's folder in case there are equivalently named files in
different folders. The result files are typically written as json so they can be easily read and uploaded in a future interaction with a monitor.
^^^^^^^^^^^^^^^^^
Analyzer Metadata
^^^^^^^^^^^^^^^^^
Your analyzer is required to have the class attributes ``name``, ``outfile``,
and ``description``. These are printed to the user with they use the subcommand
``spack analyze list-analyzers``. Here is an example.
As we mentioned above, note that this analyzer would live in a module named
``libabigail.py`` in the analyzers folder so that the class can be discovered.
.. code-block:: python
class Libabigail(AnalyzerBase):
name = "libabigail"
outfile = "spack-analyzer-libabigail.json"
description = "Application Binary Interface (ABI) features for objects"
This means that the name and output file should be unique for your analyzer.
Note that "all" cannot be the name of an analyzer, as this key is used to indicate
that the user wants to run all analyzers.
.. _analyzer_run_function:
^^^^^^^^^^^^^^^^^^^^^^^^
An analyzer run Function
^^^^^^^^^^^^^^^^^^^^^^^^
The core of an analyzer is its ``run()`` function, which should accept no
arguments. You can assume your analyzer has the package spec of interest at ``self.spec``
and it's up to the run function to generate whatever analysis data you need,
and then return the object with a key as the analyzer name. The result data
should be a list of objects, each with a name, ``analyzer_name``, ``install_file``,
and one of ``value`` or ``binary_value``. The install file should be for a relative
path, and not the absolute path. For example, let's say we extract a metric called
``metric`` for ``bin/wget`` using our analyzer ``thebest-analyzer``.
We might have data that looks like this:
.. code-block:: python
result = {"name": "metric", "analyzer_name": "thebest-analyzer", "value": "1", "install_file": "bin/wget"}
We'd then return it as follows - note that they key is the analyzer name at ``self.name``.
.. code-block:: python
return {self.name: result}
This will save the complete result to the analyzer metadata folder, as described
previously. If you want support for adding a different kind of metadata (e.g.,
not associated with an install file) then the monitor server would need to be updated
to support this first.
^^^^^^^^^^^^^^^^^^^^^^^^^
An analyzer init Function
^^^^^^^^^^^^^^^^^^^^^^^^^
If you don't need any extra dependencies or checks, you can skip defining an analyzer
init function, as the base class will handle it. Typically, it will accept
a spec, and an optional output directory (if the user does not want the default
metadata folder for analyzer results). The analyzer init function should call
it's parent init, and then do any extra checks or validation that are required to
work. For example:
.. code-block:: python
def __init__(self, spec, dirname=None):
super(Myanalyzer, self).__init__(spec, dirname)
# install extra dependencies, do extra preparation and checks here
At the end of the init, you will have available to you:
- **self.spec**: the spec object
- **self.dirname**: an optional directory name the user as provided at init to save
- **self.output_dir**: the analyzer metadata directory, where we save by default
- **self.meta_dir**: the path to the package metadata directory (.spack) if you need it
And can proceed to write your analyzer.
^^^^^^^^^^^^^^^^^^^^^^^
Saving Analyzer Results
^^^^^^^^^^^^^^^^^^^^^^^
The analyzer will have ``save_result`` called, with the result object generated
to save it to the filesystem, and if the user has added the ``--monitor`` flag
to upload it to a monitor server. If your result follows an accepted result
format and you don't need to parse it further, you don't need to add this
function to your class. However, if your result data is large or otherwise
needs additional parsing, you can define it. If you define the function, it
is useful to know about the ``output_dir`` property, which you can join
with your output file relative path of choice:
.. code-block:: python
outfile = os.path.join(self.output_dir, "my-output-file.txt")
The directory will be provided by the ``output_dir`` property but it won't exist,
so you should create it:
.. code::block:: python
# Create the output directory
if not os.path.exists(self._output_dir):
os.makedirs(self._output_dir)
If you are generating results that match to specific files in the package
install directory, you should try to maintain those paths in the case that
there are equivalently named files in different directories that would
overwrite one another. As an example of an analyzer with a custom save,
the Libabigail analyzer saves ``*.xml`` files to the analyzer metadata
folder in ``run()``, as they are either binaries, or as xml (text) would
usually be too big to pass in one request. For this reason, the files
are saved during ``run()`` and the filenames added to the result object,
and then when the result object is passed back into ``save_result()``,
we skip saving to the filesystem, and instead read the file and send
each one (separately) to the monitor:
.. code-block:: python
def save_result(self, result, monitor=None, overwrite=False):
"""ABI results are saved to individual files, so each one needs to be
read and uploaded. Result here should be the lookup generated in run(),
the key is the analyzer name, and each value is the result file.
We currently upload the entire xml as text because libabigail can't
easily read gzipped xml, but this will be updated when it can.
"""
if not monitor:
return
name = self.spec.package.name
for obj, filename in result.get(self.name, {}).items():
# Don't include the prefix
rel_path = obj.replace(self.spec.prefix + os.path.sep, "")
# We've already saved the results to file during run
content = spack.monitor.read_file(filename)
# A result needs an analyzer, value or binary_value, and name
data = {"value": content, "install_file": rel_path, "name": "abidw-xml"}
tty.info("Sending result for %s %s to monitor." % (name, rel_path))
monitor.send_analyze_metadata(self.spec.package, {"libabigail": [data]})
Notice that this function, if you define it, requires a result object (generated by
``run()``, a monitor (if you want to send), and a boolean ``overwrite`` to be used
to check if a result exists first, and not write to it if the result exists and
overwrite is False. Also notice that since we already saved these files to the analyzer metadata folder, we return early if a monitor isn't defined, because this function serves to send results to the monitor. If you haven't saved anything to the analyzer metadata folder
yet, you might want to do that here. You should also use ``tty.info`` to give
the user a message of "Writing result to $DIRNAME."
.. _writing-commands:
@@ -699,23 +448,6 @@ with a hook, and this is the purpose of this particular hook. Akin to
``on_phase_success`` we require the same variables - the package that failed,
the name of the phase, and the log file where we might find errors.
"""""""""""""""""""""""""""""""""
``on_analyzer_save(pkg, result)``
"""""""""""""""""""""""""""""""""
After an analyzer has saved some result for a package, this hook is called,
and it provides the package that we just ran the analysis for, along with
the loaded result. Typically, a result is structured to have the name
of the analyzer as key, and the result object that is defined in detail in
:ref:`analyzer_run_function`.
.. code-block:: python
def on_analyzer_save(pkg, result):
"""given a package and a result...
"""
print('Do something extra with a package analysis result here')
^^^^^^^^^^^^^^^^^^^^^^
Adding a New Hook Type