removing feature bloat: monitor and analyzers (#31130)
Signed-off-by: vsoch <vsoch@users.noreply.github.com> Co-authored-by: vsoch <vsoch@users.noreply.github.com>
This commit is contained in:
@@ -107,7 +107,6 @@ with a high level view of Spack's directory structure:
|
||||
llnl/ <- some general-use libraries
|
||||
|
||||
spack/ <- spack module; contains Python code
|
||||
analyzers/ <- modules to run analysis on installed packages
|
||||
build_systems/ <- modules for different build systems
|
||||
cmd/ <- each file in here is a spack subcommand
|
||||
compilers/ <- compiler description files
|
||||
@@ -242,22 +241,6 @@ Unit tests
|
||||
Implements Spack's test suite. Add a module and put its name in
|
||||
the test suite in ``__init__.py`` to add more unit tests.
|
||||
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Research and Monitoring Modules
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
:mod:`spack.monitor`
|
||||
Contains :class:`~spack.monitor.SpackMonitorClient`. This is accessed from
|
||||
the ``spack install`` and ``spack analyze`` commands to send build and
|
||||
package metadata up to a `Spack Monitor
|
||||
<https://github.com/spack/spack-monitor>`_ server.
|
||||
|
||||
|
||||
:mod:`spack.analyzers`
|
||||
A module folder with a :class:`~spack.analyzers.analyzer_base.AnalyzerBase`
|
||||
that provides base functions to run, save, and (optionally) upload analysis
|
||||
results to a `Spack Monitor <https://github.com/spack/spack-monitor>`_ server.
|
||||
|
||||
|
||||
^^^^^^^^^^^^^
|
||||
Other Modules
|
||||
@@ -301,240 +284,6 @@ Most spack commands look something like this:
|
||||
The information in Package files is used at all stages in this
|
||||
process.
|
||||
|
||||
Conceptually, packages are overloaded. They contain:
|
||||
|
||||
-------------
|
||||
Stage objects
|
||||
-------------
|
||||
|
||||
|
||||
.. _writing-analyzers:
|
||||
|
||||
-----------------
|
||||
Writing analyzers
|
||||
-----------------
|
||||
|
||||
To write an analyzer, you should add a new python file to the
|
||||
analyzers module directory at ``lib/spack/spack/analyzers`` .
|
||||
Your analyzer should be a subclass of the :class:`AnalyzerBase <spack.analyzers.analyzer_base.AnalyzerBase>`. For example, if you want
|
||||
to add an analyzer class ``Myanalyzer`` you would write to
|
||||
``spack/analyzers/myanalyzer.py`` and import and
|
||||
use the base as follows:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from .analyzer_base import AnalyzerBase
|
||||
|
||||
class Myanalyzer(AnalyzerBase):
|
||||
|
||||
|
||||
Note that the class name is your module file name, all lowercase
|
||||
except for the first capital letter. You can look at other analyzers in
|
||||
that analyzer directory for examples. The guide here will tell you about the basic functions needed.
|
||||
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Analyzer Output Directory
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
By default, when you run ``spack analyze run`` an analyzer output directory will
|
||||
be created in your spack user directory in your ``$HOME``. The reason we output here
|
||||
is because the install directory might not always be writable.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
~/.spack/
|
||||
analyzers
|
||||
|
||||
Result files will be written here, organized in subfolders in the same structure
|
||||
as the package, with each analyzer owning it's own subfolder. for example:
|
||||
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ tree ~/.spack/analyzers/
|
||||
/home/spackuser/.spack/analyzers/
|
||||
└── linux-ubuntu20.04-skylake
|
||||
└── gcc-9.3.0
|
||||
└── zlib-1.2.11-sl7m27mzkbejtkrajigj3a3m37ygv4u2
|
||||
├── environment_variables
|
||||
│ └── spack-analyzer-environment-variables.json
|
||||
├── install_files
|
||||
│ └── spack-analyzer-install-files.json
|
||||
└── libabigail
|
||||
└── lib
|
||||
└── spack-analyzer-libabigail-libz.so.1.2.11.xml
|
||||
|
||||
|
||||
Notice that for the libabigail analyzer, since results are generated per object,
|
||||
we honor the object's folder in case there are equivalently named files in
|
||||
different folders. The result files are typically written as json so they can be easily read and uploaded in a future interaction with a monitor.
|
||||
|
||||
|
||||
^^^^^^^^^^^^^^^^^
|
||||
Analyzer Metadata
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
Your analyzer is required to have the class attributes ``name``, ``outfile``,
|
||||
and ``description``. These are printed to the user with they use the subcommand
|
||||
``spack analyze list-analyzers``. Here is an example.
|
||||
As we mentioned above, note that this analyzer would live in a module named
|
||||
``libabigail.py`` in the analyzers folder so that the class can be discovered.
|
||||
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
class Libabigail(AnalyzerBase):
|
||||
|
||||
name = "libabigail"
|
||||
outfile = "spack-analyzer-libabigail.json"
|
||||
description = "Application Binary Interface (ABI) features for objects"
|
||||
|
||||
|
||||
This means that the name and output file should be unique for your analyzer.
|
||||
Note that "all" cannot be the name of an analyzer, as this key is used to indicate
|
||||
that the user wants to run all analyzers.
|
||||
|
||||
.. _analyzer_run_function:
|
||||
|
||||
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
An analyzer run Function
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The core of an analyzer is its ``run()`` function, which should accept no
|
||||
arguments. You can assume your analyzer has the package spec of interest at ``self.spec``
|
||||
and it's up to the run function to generate whatever analysis data you need,
|
||||
and then return the object with a key as the analyzer name. The result data
|
||||
should be a list of objects, each with a name, ``analyzer_name``, ``install_file``,
|
||||
and one of ``value`` or ``binary_value``. The install file should be for a relative
|
||||
path, and not the absolute path. For example, let's say we extract a metric called
|
||||
``metric`` for ``bin/wget`` using our analyzer ``thebest-analyzer``.
|
||||
We might have data that looks like this:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
result = {"name": "metric", "analyzer_name": "thebest-analyzer", "value": "1", "install_file": "bin/wget"}
|
||||
|
||||
|
||||
We'd then return it as follows - note that they key is the analyzer name at ``self.name``.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
return {self.name: result}
|
||||
|
||||
This will save the complete result to the analyzer metadata folder, as described
|
||||
previously. If you want support for adding a different kind of metadata (e.g.,
|
||||
not associated with an install file) then the monitor server would need to be updated
|
||||
to support this first.
|
||||
|
||||
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
An analyzer init Function
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If you don't need any extra dependencies or checks, you can skip defining an analyzer
|
||||
init function, as the base class will handle it. Typically, it will accept
|
||||
a spec, and an optional output directory (if the user does not want the default
|
||||
metadata folder for analyzer results). The analyzer init function should call
|
||||
it's parent init, and then do any extra checks or validation that are required to
|
||||
work. For example:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def __init__(self, spec, dirname=None):
|
||||
super(Myanalyzer, self).__init__(spec, dirname)
|
||||
|
||||
# install extra dependencies, do extra preparation and checks here
|
||||
|
||||
|
||||
At the end of the init, you will have available to you:
|
||||
|
||||
- **self.spec**: the spec object
|
||||
- **self.dirname**: an optional directory name the user as provided at init to save
|
||||
- **self.output_dir**: the analyzer metadata directory, where we save by default
|
||||
- **self.meta_dir**: the path to the package metadata directory (.spack) if you need it
|
||||
|
||||
And can proceed to write your analyzer.
|
||||
|
||||
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
Saving Analyzer Results
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The analyzer will have ``save_result`` called, with the result object generated
|
||||
to save it to the filesystem, and if the user has added the ``--monitor`` flag
|
||||
to upload it to a monitor server. If your result follows an accepted result
|
||||
format and you don't need to parse it further, you don't need to add this
|
||||
function to your class. However, if your result data is large or otherwise
|
||||
needs additional parsing, you can define it. If you define the function, it
|
||||
is useful to know about the ``output_dir`` property, which you can join
|
||||
with your output file relative path of choice:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
outfile = os.path.join(self.output_dir, "my-output-file.txt")
|
||||
|
||||
|
||||
The directory will be provided by the ``output_dir`` property but it won't exist,
|
||||
so you should create it:
|
||||
|
||||
|
||||
.. code::block:: python
|
||||
|
||||
# Create the output directory
|
||||
if not os.path.exists(self._output_dir):
|
||||
os.makedirs(self._output_dir)
|
||||
|
||||
|
||||
If you are generating results that match to specific files in the package
|
||||
install directory, you should try to maintain those paths in the case that
|
||||
there are equivalently named files in different directories that would
|
||||
overwrite one another. As an example of an analyzer with a custom save,
|
||||
the Libabigail analyzer saves ``*.xml`` files to the analyzer metadata
|
||||
folder in ``run()``, as they are either binaries, or as xml (text) would
|
||||
usually be too big to pass in one request. For this reason, the files
|
||||
are saved during ``run()`` and the filenames added to the result object,
|
||||
and then when the result object is passed back into ``save_result()``,
|
||||
we skip saving to the filesystem, and instead read the file and send
|
||||
each one (separately) to the monitor:
|
||||
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def save_result(self, result, monitor=None, overwrite=False):
|
||||
"""ABI results are saved to individual files, so each one needs to be
|
||||
read and uploaded. Result here should be the lookup generated in run(),
|
||||
the key is the analyzer name, and each value is the result file.
|
||||
We currently upload the entire xml as text because libabigail can't
|
||||
easily read gzipped xml, but this will be updated when it can.
|
||||
"""
|
||||
if not monitor:
|
||||
return
|
||||
|
||||
name = self.spec.package.name
|
||||
|
||||
for obj, filename in result.get(self.name, {}).items():
|
||||
|
||||
# Don't include the prefix
|
||||
rel_path = obj.replace(self.spec.prefix + os.path.sep, "")
|
||||
|
||||
# We've already saved the results to file during run
|
||||
content = spack.monitor.read_file(filename)
|
||||
|
||||
# A result needs an analyzer, value or binary_value, and name
|
||||
data = {"value": content, "install_file": rel_path, "name": "abidw-xml"}
|
||||
tty.info("Sending result for %s %s to monitor." % (name, rel_path))
|
||||
monitor.send_analyze_metadata(self.spec.package, {"libabigail": [data]})
|
||||
|
||||
|
||||
|
||||
Notice that this function, if you define it, requires a result object (generated by
|
||||
``run()``, a monitor (if you want to send), and a boolean ``overwrite`` to be used
|
||||
to check if a result exists first, and not write to it if the result exists and
|
||||
overwrite is False. Also notice that since we already saved these files to the analyzer metadata folder, we return early if a monitor isn't defined, because this function serves to send results to the monitor. If you haven't saved anything to the analyzer metadata folder
|
||||
yet, you might want to do that here. You should also use ``tty.info`` to give
|
||||
the user a message of "Writing result to $DIRNAME."
|
||||
|
||||
|
||||
.. _writing-commands:
|
||||
|
||||
@@ -699,23 +448,6 @@ with a hook, and this is the purpose of this particular hook. Akin to
|
||||
``on_phase_success`` we require the same variables - the package that failed,
|
||||
the name of the phase, and the log file where we might find errors.
|
||||
|
||||
"""""""""""""""""""""""""""""""""
|
||||
``on_analyzer_save(pkg, result)``
|
||||
"""""""""""""""""""""""""""""""""
|
||||
|
||||
After an analyzer has saved some result for a package, this hook is called,
|
||||
and it provides the package that we just ran the analysis for, along with
|
||||
the loaded result. Typically, a result is structured to have the name
|
||||
of the analyzer as key, and the result object that is defined in detail in
|
||||
:ref:`analyzer_run_function`.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def on_analyzer_save(pkg, result):
|
||||
"""given a package and a result...
|
||||
"""
|
||||
print('Do something extra with a package analysis result here')
|
||||
|
||||
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
Adding a New Hook Type
|
||||
|
Reference in New Issue
Block a user