parser: refactor with coarser token granularity (#34151)

## Motivation

Our parser grew to be quite complex, with a 2-state lexer and logic in the parser
that has up to 5 levels of nested conditionals. In the future, to turn compilers into
proper dependencies, we'll have to increase the complexity further as we foresee
the need to add:
1. Edge attributes
2. Spec nesting

to the spec syntax (see https://github.com/spack/seps/pull/5 for an initial discussion of
those changes).  The main attempt here is thus to _simplify the existing code_ before
we start extending it later. We try to do that by adopting a different token granularity,
and by using more complex regexes for tokenization. This allow us to a have a "flatter"
encoding for the parser. i.e., it has fewer nested conditionals and a near-trivial lexer.

There are places, namely in `VERSION`, where we have to use negative lookahead judiciously
to avoid ambiguity.  Specifically, this parse is ambiguous without `(?!\s*=)` in `VERSION_RANGE`
and an extra final `\b` in `VERSION`:

```
@ 1.2.3     :        develop  # This is a version range 1.2.3:develop
@ 1.2.3     :        develop=foo  # This is a version range 1.2.3: followed by a key-value pair
```

## Differences with the previous parser

~There are currently 2 known differences with the previous parser, which have been added on purpose:~

- ~No spaces allowed after a sigil (e.g. `foo @ 1.2.3` is invalid while `foo @1.2.3` is valid)~
- ~`/<hash> @1.2.3` can be parsed as a concrete spec followed by an anonymous spec (before was invalid)~

~We can recover the previous behavior on both ones but, especially for the second one, it seems the current behavior in the PR is more consistent.~

The parser is currently 100% backward compatible.

## Error handling

Being based on more complex regexes, we can possibly improve error
handling by adding regexes for common issues and hint users on that.
I'll leave that for a following PR, but there's a stub for this approach in the PR.

## Performance

To be sure we don't add any performance penalty with this new encoding, I measured:
```console
$ spack python -m timeit -s "import spack.spec" -c "spack.spec.Spec(<spec>)"
```
for different specs on my machine:

* **Spack:** 0.20.0.dev0 (c9db4e50ba045f5697816187accaf2451cb1aae7)
* **Python:** 3.8.10
* **Platform:** linux-ubuntu20.04-icelake
* **Concretizer:** clingo

results are:

| Spec          | develop       | this PR |
| ------------- | ------------- | ------- |
| `trilinos`  |  28.9 usec | 13.1 usec |
| `trilinos @1.2.10:1.4.20,2.0.1`  | 131 usec  | 120 usec |
| `trilinos %gcc`  | 44.9 usec  | 20.9 usec |
| `trilinos +foo`  | 44.1 usec  | 21.3 usec |
| `trilinos foo=bar`  | 59.5 usec  | 25.6 usec |
| `trilinos foo=bar ^ mpich foo=baz`  | 120 usec  | 82.1 usec |

so this new parser seems to be consistently faster than the previous one.

## Modifications

In this PR we just substituted the Spec parser, which means:
- [x] Deleted in `spec.py` the `SpecParser` and `SpecLexer` classes. deleted `spack/parse.py`
- [x] Added a new parser in `spack/parser.py`
- [x] Hooked the new parser in all the places the previous one was used
- [x] Adapted unit tests in `test/spec_syntax.py`


## Possible future improvements

Random thoughts while working on the PR:
- Currently we transform hashes and files into specs during parsing. I think
we might want to introduce an additional step and parse special objects like
a `FileSpec` etc. in-between parsing and concretization.
This commit is contained in:
Massimiliano Culpo 2022-12-07 23:56:53 +01:00 committed by GitHub
parent 412bec45aa
commit ab6499ce1e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
14 changed files with 1514 additions and 1575 deletions

View File

@ -175,14 +175,11 @@ Spec-related modules
^^^^^^^^^^^^^^^^^^^^
:mod:`spack.spec`
Contains :class:`~spack.spec.Spec` and :class:`~spack.spec.SpecParser`.
Also implements most of the logic for normalization and concretization
Contains :class:`~spack.spec.Spec`. Also implements most of the logic for concretization
of specs.
:mod:`spack.parse`
Contains some base classes for implementing simple recursive descent
parsers: :class:`~spack.parse.Parser` and :class:`~spack.parse.Lexer`.
Used by :class:`~spack.spec.SpecParser`.
:mod:`spack.parser`
Contains :class:`~spack.parser.SpecParser` and functions related to parsing specs.
:mod:`spack.concretize`
Contains :class:`~spack.concretize.Concretizer` implementation,

View File

@ -26,6 +26,7 @@
import spack.environment as ev
import spack.error
import spack.extensions
import spack.parser
import spack.paths
import spack.spec
import spack.store
@ -217,7 +218,7 @@ def parse_specs(args, **kwargs):
unquoted_flags = _UnquotedFlags.extract(sargs)
try:
specs = spack.spec.parse(sargs)
specs = spack.parser.parse(sargs)
for spec in specs:
if concretize:
spec.concretize(tests=tests) # implies normalize

View File

@ -495,6 +495,8 @@ def provides(*specs, **kwargs):
"""
def _execute_provides(pkg):
import spack.parser # Avoid circular dependency
when = kwargs.get("when")
when_spec = make_when_spec(when)
if not when_spec:
@ -505,7 +507,7 @@ def _execute_provides(pkg):
when_spec.name = pkg.name
for string in specs:
for provided_spec in spack.spec.parse(string):
for provided_spec in spack.parser.parse(string):
if pkg.name == provided_spec.name:
raise CircularReferenceError("Package '%s' cannot provide itself." % pkg.name)

View File

@ -1,174 +0,0 @@
# Copyright 2013-2022 Lawrence Livermore National Security, LLC and other
# Spack Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: (Apache-2.0 OR MIT)
import itertools
import re
import shlex
import sys
import spack.error
import spack.util.path as sp
class Token(object):
"""Represents tokens; generated from input by lexer and fed to parse()."""
__slots__ = "type", "value", "start", "end"
def __init__(self, type, value="", start=0, end=0):
self.type = type
self.value = value
self.start = start
self.end = end
def __repr__(self):
return str(self)
def __str__(self):
return "<%d: '%s'>" % (self.type, self.value)
def is_a(self, type):
return self.type == type
def __eq__(self, other):
return (self.type == other.type) and (self.value == other.value)
class Lexer(object):
"""Base class for Lexers that keep track of line numbers."""
__slots__ = "scanner0", "scanner1", "mode", "mode_switches_01", "mode_switches_10"
def __init__(self, lexicon0, mode_switches_01=[], lexicon1=[], mode_switches_10=[]):
self.scanner0 = re.Scanner(lexicon0)
self.mode_switches_01 = mode_switches_01
self.scanner1 = re.Scanner(lexicon1)
self.mode_switches_10 = mode_switches_10
self.mode = 0
def token(self, type, value=""):
if self.mode == 0:
return Token(type, value, self.scanner0.match.start(0), self.scanner0.match.end(0))
else:
return Token(type, value, self.scanner1.match.start(0), self.scanner1.match.end(0))
def lex_word(self, word):
scanner = self.scanner0
mode_switches = self.mode_switches_01
if self.mode == 1:
scanner = self.scanner1
mode_switches = self.mode_switches_10
tokens, remainder = scanner.scan(word)
remainder_used = 0
for i, t in enumerate(tokens):
if t.type in mode_switches:
# Combine post-switch tokens with remainder and
# scan in other mode
self.mode = 1 - self.mode # swap 0/1
remainder_used = 1
tokens = tokens[: i + 1] + self.lex_word(
word[word.index(t.value) + len(t.value) :]
)
break
if remainder and not remainder_used:
msg = "Invalid character, '{0}',".format(remainder[0])
msg += " in '{0}' at index {1}".format(word, word.index(remainder))
raise LexError(msg, word, word.index(remainder))
return tokens
def lex(self, text):
lexed = []
for word in text:
tokens = self.lex_word(word)
lexed.extend(tokens)
return lexed
class Parser(object):
"""Base class for simple recursive descent parsers."""
__slots__ = "tokens", "token", "next", "lexer", "text"
def __init__(self, lexer):
self.tokens = iter([]) # iterators over tokens, handled in order.
self.token = Token(None) # last accepted token
self.next = None # next token
self.lexer = lexer
self.text = None
def gettok(self):
"""Puts the next token in the input stream into self.next."""
try:
self.next = next(self.tokens)
except StopIteration:
self.next = None
def push_tokens(self, iterable):
"""Adds all tokens in some iterable to the token stream."""
self.tokens = itertools.chain(iter(iterable), iter([self.next]), self.tokens)
self.gettok()
def accept(self, id):
"""Put the next symbol in self.token if accepted, then call gettok()"""
if self.next and self.next.is_a(id):
self.token = self.next
self.gettok()
return True
return False
def next_token_error(self, message):
"""Raise an error about the next token in the stream."""
raise ParseError(message, self.text[0], self.token.end)
def last_token_error(self, message):
"""Raise an error about the previous token in the stream."""
raise ParseError(message, self.text[0], self.token.start)
def unexpected_token(self):
self.next_token_error("Unexpected token: '%s'" % self.next.value)
def expect(self, id):
"""Like accept(), but fails if we don't like the next token."""
if self.accept(id):
return True
else:
if self.next:
self.unexpected_token()
else:
self.next_token_error("Unexpected end of input")
sys.exit(1)
def setup(self, text):
if isinstance(text, str):
# shlex does not handle Windows path
# separators, so we must normalize to posix
text = sp.convert_to_posix_path(text)
text = shlex.split(str(text))
self.text = text
self.push_tokens(self.lexer.lex(text))
def parse(self, text):
self.setup(text)
return self.do_parse()
class ParseError(spack.error.SpackError):
"""Raised when we don't hit an error while parsing."""
def __init__(self, message, string, pos):
super(ParseError, self).__init__(message)
self.string = string
self.pos = pos
class LexError(ParseError):
"""Raised when we don't know how to lex something."""
def __init__(self, message, string, pos):
super(LexError, self).__init__(message, string, pos)

522
lib/spack/spack/parser.py Normal file
View File

@ -0,0 +1,522 @@
# Copyright 2013-2022 Lawrence Livermore National Security, LLC and other
# Spack Project Developers. See the top-level COPYRIGHT file for details.
#
# SPDX-License-Identifier: (Apache-2.0 OR MIT)
"""Parser for spec literals
Here is the EBNF grammar for a spec::
spec = [name] [node_options] { ^ node } |
[name] [node_options] hash |
filename
node = name [node_options] |
[name] [node_options] hash |
filename
node_options = [@(version_list|version_pair)] [%compiler] { variant }
hash = / id
filename = (.|/|[a-zA-Z0-9-_]*/)([a-zA-Z0-9-_./]*)(.json|.yaml)
name = id | namespace id
namespace = { id . }
variant = bool_variant | key_value | propagated_bv | propagated_kv
bool_variant = +id | ~id | -id
propagated_bv = ++id | ~~id | --id
key_value = id=id | id=quoted_id
propagated_kv = id==id | id==quoted_id
compiler = id [@version_list]
version_pair = git_version=vid
version_list = (version|version_range) [ { , (version|version_range)} ]
version_range = vid:vid | vid: | :vid | :
version = vid
git_version = git.(vid) | git_hash
git_hash = [A-Fa-f0-9]{40}
quoted_id = " id_with_ws " | ' id_with_ws '
id_with_ws = [a-zA-Z0-9_][a-zA-Z_0-9-.\\s]*
vid = [a-zA-Z0-9_][a-zA-Z_0-9-.]*
id = [a-zA-Z0-9_][a-zA-Z_0-9-]*
Identifiers using the <name>=<value> command, such as architectures and
compiler flags, require a space before the name.
There is one context-sensitive part: ids in versions may contain '.', while
other ids may not.
There is one ambiguity: since '-' is allowed in an id, you need to put
whitespace space before -variant for it to be tokenized properly. You can
either use whitespace, or you can just use ~variant since it means the same
thing. Spack uses ~variant in directory names and in the canonical form of
specs to avoid ambiguity. Both are provided because ~ can cause shell
expansion when it is the first character in an id typed on the command line.
"""
import enum
import pathlib
import re
from typing import Iterator, List, Match, Optional
from llnl.util.tty import color
import spack.error
import spack.spec
import spack.variant
import spack.version
#: Valid name for specs and variants. Here we are not using
#: the previous "w[\w.-]*" since that would match most
#: characters that can be part of a word in any language
IDENTIFIER = r"([a-zA-Z_0-9][a-zA-Z_0-9\-]*)"
DOTTED_IDENTIFIER = rf"({IDENTIFIER}(\.{IDENTIFIER})+)"
GIT_HASH = r"([A-Fa-f0-9]{40})"
GIT_VERSION = rf"((git\.({DOTTED_IDENTIFIER}|{IDENTIFIER}))|({GIT_HASH}))"
NAME = r"[a-zA-Z_0-9][a-zA-Z_0-9\-.]*"
HASH = r"[a-zA-Z_0-9]+"
#: A filename starts either with a "." or a "/" or a "{name}/"
FILENAME = r"(\.|\/|[a-zA-Z0-9-_]*\/)([a-zA-Z0-9-_\.\/]*)(\.json|\.yaml)"
VALUE = r"([a-zA-Z_0-9\-+\*.,:=\~\/\\]+)"
QUOTED_VALUE = r"[\"']+([a-zA-Z_0-9\-+\*.,:=\~\/\\\s]+)[\"']+"
VERSION = r"([a-zA-Z0-9_][a-zA-Z_0-9\-\.]*\b)"
VERSION_RANGE = rf"({VERSION}\s*:\s*{VERSION}(?!\s*=)|:\s*{VERSION}(?!\s*=)|{VERSION}\s*:|:)"
VERSION_LIST = rf"({VERSION_RANGE}|{VERSION})(\s*[,]\s*({VERSION_RANGE}|{VERSION}))*"
class TokenBase(enum.Enum):
"""Base class for an enum type with a regex value"""
def __new__(cls, *args, **kwargs):
# See
value = len(cls.__members__) + 1
obj = object.__new__(cls)
obj._value_ = value
return obj
def __init__(self, regex):
self.regex = regex
def __str__(self):
return f"{self._name_}"
class TokenType(TokenBase):
"""Enumeration of the different token kinds in the spec grammar.
Order of declaration is extremely important, since text containing specs is parsed with a
single regex obtained by ``"|".join(...)`` of all the regex in the order of declaration.
"""
# Dependency
DEPENDENCY = r"(\^)"
# Version
VERSION_HASH_PAIR = rf"(@({GIT_VERSION})=({VERSION}))"
VERSION = rf"(@\s*({VERSION_LIST}))"
# Variants
PROPAGATED_BOOL_VARIANT = rf"((\+\+|~~|--)\s*{NAME})"
BOOL_VARIANT = rf"([~+-]\s*{NAME})"
PROPAGATED_KEY_VALUE_PAIR = rf"({NAME}\s*==\s*({VALUE}|{QUOTED_VALUE}))"
KEY_VALUE_PAIR = rf"({NAME}\s*=\s*({VALUE}|{QUOTED_VALUE}))"
# Compilers
COMPILER_AND_VERSION = rf"(%\s*({NAME})([\s]*)@\s*({VERSION_LIST}))"
COMPILER = rf"(%\s*({NAME}))"
# FILENAME
FILENAME = rf"({FILENAME})"
# Package name
FULLY_QUALIFIED_PACKAGE_NAME = rf"({DOTTED_IDENTIFIER})"
UNQUALIFIED_PACKAGE_NAME = rf"({IDENTIFIER})"
# DAG hash
DAG_HASH = rf"(/({HASH}))"
# White spaces
WS = r"(\s+)"
class ErrorTokenType(TokenBase):
"""Enum with regexes for error analysis"""
# Unexpected character
UNEXPECTED = r"(.[\s]*)"
class Token:
"""Represents tokens; generated from input by lexer and fed to parse()."""
__slots__ = "kind", "value", "start", "end"
def __init__(
self, kind: TokenType, value: str, start: Optional[int] = None, end: Optional[int] = None
):
self.kind = kind
self.value = value
self.start = start
self.end = end
def __repr__(self):
return str(self)
def __str__(self):
return f"({self.kind}, {self.value})"
def __eq__(self, other):
return (self.kind == other.kind) and (self.value == other.value)
#: List of all the regexes used to match spec parts, in order of precedence
TOKEN_REGEXES = [rf"(?P<{token}>{token.regex})" for token in TokenType]
#: List of all valid regexes followed by error analysis regexes
ERROR_HANDLING_REGEXES = TOKEN_REGEXES + [
rf"(?P<{token}>{token.regex})" for token in ErrorTokenType
]
#: Regex to scan a valid text
ALL_TOKENS = re.compile("|".join(TOKEN_REGEXES))
#: Regex to analyze an invalid text
ANALYSIS_REGEX = re.compile("|".join(ERROR_HANDLING_REGEXES))
def tokenize(text: str) -> Iterator[Token]:
"""Return a token generator from the text passed as input.
Raises:
SpecTokenizationError: if we can't tokenize anymore, but didn't reach the
end of the input text.
"""
scanner = ALL_TOKENS.scanner(text) # type: ignore[attr-defined]
match: Optional[Match] = None
for match in iter(scanner.match, None):
yield Token(
TokenType.__members__[match.lastgroup], # type: ignore[attr-defined]
match.group(), # type: ignore[attr-defined]
match.start(), # type: ignore[attr-defined]
match.end(), # type: ignore[attr-defined]
)
if match is None and not text:
# We just got an empty string
return
if match is None or match.end() != len(text):
scanner = ANALYSIS_REGEX.scanner(text) # type: ignore[attr-defined]
matches = [m for m in iter(scanner.match, None)] # type: ignore[var-annotated]
raise SpecTokenizationError(matches, text)
class TokenContext:
"""Token context passed around by parsers"""
__slots__ = "token_stream", "current_token", "next_token"
def __init__(self, token_stream: Iterator[Token]):
self.token_stream = token_stream
self.current_token = None
self.next_token = None
self.advance()
def advance(self):
"""Advance one token"""
self.current_token, self.next_token = self.next_token, next(self.token_stream, None)
def accept(self, kind: TokenType):
"""If the next token is of the specified kind, advance the stream and return True.
Otherwise return False.
"""
if self.next_token and self.next_token.kind == kind:
self.advance()
return True
return False
class SpecParser:
"""Parse text into specs"""
__slots__ = "literal_str", "ctx"
def __init__(self, literal_str: str):
self.literal_str = literal_str
self.ctx = TokenContext(filter(lambda x: x.kind != TokenType.WS, tokenize(literal_str)))
def tokens(self) -> List[Token]:
"""Return the entire list of token from the initial text. White spaces are
filtered out.
"""
return list(filter(lambda x: x.kind != TokenType.WS, tokenize(self.literal_str)))
def next_spec(self, initial_spec: Optional[spack.spec.Spec] = None) -> spack.spec.Spec:
"""Return the next spec parsed from text.
Args:
initial_spec: object where to parse the spec. If None a new one
will be created.
Return
The spec that was parsed
"""
initial_spec = initial_spec or spack.spec.Spec()
root_spec = SpecNodeParser(self.ctx).parse(initial_spec)
while True:
if self.ctx.accept(TokenType.DEPENDENCY):
dependency = SpecNodeParser(self.ctx).parse(spack.spec.Spec())
if dependency == spack.spec.Spec():
msg = (
"this dependency sigil needs to be followed by a package name "
"or a node attribute (version, variant, etc.)"
)
raise SpecParsingError(msg, self.ctx.current_token, self.literal_str)
if root_spec.concrete:
raise spack.spec.RedundantSpecError(root_spec, "^" + str(dependency))
root_spec._add_dependency(dependency, ())
else:
break
return root_spec
def all_specs(self) -> List[spack.spec.Spec]:
"""Return all the specs that remain to be parsed"""
return list(iter(self.next_spec, spack.spec.Spec()))
class SpecNodeParser:
"""Parse a single spec node from a stream of tokens"""
__slots__ = "ctx", "has_compiler", "has_version", "has_hash"
def __init__(self, ctx):
self.ctx = ctx
self.has_compiler = False
self.has_version = False
self.has_hash = False
def parse(self, initial_spec: spack.spec.Spec) -> spack.spec.Spec:
"""Parse a single spec node from a stream of tokens
Args:
initial_spec: object to be constructed
Return
The object passed as argument
"""
import spack.environment # Needed to retrieve by hash
# If we start with a package name we have a named spec, we cannot
# accept another package name afterwards in a node
if self.ctx.accept(TokenType.UNQUALIFIED_PACKAGE_NAME):
initial_spec.name = self.ctx.current_token.value
elif self.ctx.accept(TokenType.FULLY_QUALIFIED_PACKAGE_NAME):
parts = self.ctx.current_token.value.split(".")
name = parts[-1]
namespace = ".".join(parts[:-1])
initial_spec.name = name
initial_spec.namespace = namespace
elif self.ctx.accept(TokenType.FILENAME):
return FileParser(self.ctx).parse(initial_spec)
while True:
if self.ctx.accept(TokenType.COMPILER):
self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
if self.has_compiler:
raise spack.spec.DuplicateCompilerSpecError(
f"{initial_spec} cannot have multiple compilers"
)
compiler_name = self.ctx.current_token.value[1:]
initial_spec.compiler = spack.spec.CompilerSpec(compiler_name.strip(), ":")
self.has_compiler = True
elif self.ctx.accept(TokenType.COMPILER_AND_VERSION):
self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
if self.has_compiler:
raise spack.spec.DuplicateCompilerSpecError(
f"{initial_spec} cannot have multiple compilers"
)
compiler_name, compiler_version = self.ctx.current_token.value[1:].split("@")
initial_spec.compiler = spack.spec.CompilerSpec(
compiler_name.strip(), compiler_version
)
self.has_compiler = True
elif self.ctx.accept(TokenType.VERSION) or self.ctx.accept(
TokenType.VERSION_HASH_PAIR
):
self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
if self.has_version:
raise spack.spec.MultipleVersionError(
f"{initial_spec} cannot have multiple versions"
)
version_list = spack.version.VersionList()
version_list.add(spack.version.from_string(self.ctx.current_token.value[1:]))
initial_spec.versions = version_list
# Add a git lookup method for GitVersions
if (
initial_spec.name
and initial_spec.versions.concrete
and isinstance(initial_spec.version, spack.version.GitVersion)
):
initial_spec.version.generate_git_lookup(initial_spec.fullname)
self.has_version = True
elif self.ctx.accept(TokenType.BOOL_VARIANT):
self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
variant_value = self.ctx.current_token.value[0] == "+"
initial_spec._add_flag(
self.ctx.current_token.value[1:].strip(), variant_value, propagate=False
)
elif self.ctx.accept(TokenType.PROPAGATED_BOOL_VARIANT):
self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
variant_value = self.ctx.current_token.value[0:2] == "++"
initial_spec._add_flag(
self.ctx.current_token.value[2:].strip(), variant_value, propagate=True
)
elif self.ctx.accept(TokenType.KEY_VALUE_PAIR):
self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
name, value = self.ctx.current_token.value.split("=", maxsplit=1)
name = name.strip("'\" ")
value = value.strip("'\" ")
initial_spec._add_flag(name, value, propagate=False)
elif self.ctx.accept(TokenType.PROPAGATED_KEY_VALUE_PAIR):
self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
name, value = self.ctx.current_token.value.split("==", maxsplit=1)
name = name.strip("'\" ")
value = value.strip("'\" ")
initial_spec._add_flag(name, value, propagate=True)
elif not self.has_hash and self.ctx.accept(TokenType.DAG_HASH):
dag_hash = self.ctx.current_token.value[1:]
matches = []
if spack.environment.active_environment():
matches = spack.environment.active_environment().get_by_hash(dag_hash)
if not matches:
matches = spack.store.db.get_by_hash(dag_hash)
if not matches:
raise spack.spec.NoSuchHashError(dag_hash)
if len(matches) != 1:
raise spack.spec.AmbiguousHashError(
f"Multiple packages specify hash beginning '{dag_hash}'.", *matches
)
spec_by_hash = matches[0]
if not spec_by_hash.satisfies(initial_spec):
raise spack.spec.InvalidHashError(initial_spec, spec_by_hash.dag_hash())
initial_spec._dup(spec_by_hash)
self.has_hash = True
else:
break
return initial_spec
def hash_not_parsed_or_raise(self, spec, addition):
if not self.has_hash:
return
raise spack.spec.RedundantSpecError(spec, addition)
class FileParser:
"""Parse a single spec from a JSON or YAML file"""
__slots__ = ("ctx",)
def __init__(self, ctx):
self.ctx = ctx
def parse(self, initial_spec: spack.spec.Spec) -> spack.spec.Spec:
"""Parse a spec tree from a specfile.
Args:
initial_spec: object where to parse the spec
Return
The initial_spec passed as argument, once constructed
"""
file = pathlib.Path(self.ctx.current_token.value)
if not file.exists():
raise spack.spec.NoSuchSpecFileError(f"No such spec file: '{file}'")
with file.open("r", encoding="utf-8") as stream:
if str(file).endswith(".json"):
spec_from_file = spack.spec.Spec.from_json(stream)
else:
spec_from_file = spack.spec.Spec.from_yaml(stream)
initial_spec._dup(spec_from_file)
return initial_spec
def parse(text: str) -> List[spack.spec.Spec]:
"""Parse text into a list of strings
Args:
text (str): text to be parsed
Return:
List of specs
"""
return SpecParser(text).all_specs()
def parse_one_or_raise(
text: str, initial_spec: Optional[spack.spec.Spec] = None
) -> spack.spec.Spec:
"""Parse exactly one spec from text and return it, or raise
Args:
text (str): text to be parsed
initial_spec: buffer where to parse the spec. If None a new one will be created.
"""
stripped_text = text.strip()
parser = SpecParser(stripped_text)
result = parser.next_spec(initial_spec)
last_token = parser.ctx.current_token
if last_token is not None and last_token.end != len(stripped_text):
message = "a single spec was requested, but parsed more than one:"
message += f"\n{text}"
if last_token is not None:
underline = f"\n{' ' * last_token.end}{'^' * (len(text) - last_token.end)}"
message += color.colorize(f"@*r{{{underline}}}")
raise ValueError(message)
return result
class SpecSyntaxError(Exception):
"""Base class for Spec syntax errors"""
class SpecTokenizationError(SpecSyntaxError):
"""Syntax error in a spec string"""
def __init__(self, matches, text):
message = "unexpected tokens in the spec string\n"
message += f"{text}"
underline = "\n"
for match in matches:
if match.lastgroup == str(ErrorTokenType.UNEXPECTED):
underline += f"{'^' * (match.end() - match.start())}"
continue
underline += f"{' ' * (match.end() - match.start())}"
message += color.colorize(f"@*r{{{underline}}}")
super().__init__(message)
class SpecParsingError(SpecSyntaxError):
"""Error when parsing tokens"""
def __init__(self, message, token, text):
message += f"\n{text}"
underline = f"\n{' '*token.start}{'^'*(token.end - token.start)}"
message += color.colorize(f"@*r{{{underline}}}")
super().__init__(message)

View File

@ -8,14 +8,14 @@
import llnl.util.lang
import llnl.util.tty
import spack.spec
# jsonschema is imported lazily as it is heavy to import
# and increases the start-up time
def _make_validator():
import jsonschema
import spack.parser
def _validate_spec(validator, is_spec, instance, schema):
"""Check if the attributes on instance are valid specs."""
import jsonschema
@ -25,11 +25,9 @@ def _validate_spec(validator, is_spec, instance, schema):
for spec_str in instance:
try:
spack.spec.parse(spec_str)
except spack.spec.SpecParseError as e:
yield jsonschema.ValidationError(
'"{0}" is an invalid spec [{1}]'.format(spec_str, str(e))
)
spack.parser.parse(spec_str)
except spack.parser.SpecSyntaxError as e:
yield jsonschema.ValidationError(str(e))
def _deprecated_properties(validator, deprecated, instance, schema):
if not (validator.is_type(instance, "object") or validator.is_type(instance, "array")):

View File

@ -47,37 +47,6 @@
6. The architecture to build with. This is needed on machines where
cross-compilation is required
Here is the EBNF grammar for a spec::
spec-list = { spec [ dep-list ] }
dep_list = { ^ spec }
spec = id [ options ]
options = { @version-list | ++variant | +variant |
--variant | -variant | ~~variant | ~variant |
variant=value | variant==value | %compiler |
arch=architecture | [ flag ]==value | [ flag ]=value}
flag = { cflags | cxxflags | fcflags | fflags | cppflags |
ldflags | ldlibs }
variant = id
architecture = id
compiler = id [ version-list ]
version-list = version [ { , version } ]
version = id | id: | :id | id:id
id = [A-Za-z0-9_][A-Za-z0-9_.-]*
Identifiers using the <name>=<value> command, such as architectures and
compiler flags, require a space before the name.
There is one context-sensitive part: ids in versions may contain '.', while
other ids may not.
There is one ambiguity: since '-' is allowed in an id, you need to put
whitespace space before -variant for it to be tokenized properly. You can
either use whitespace, or you can just use ~variant since it means the same
thing. Spack uses ~variant in directory names and in the canonical form of
specs to avoid ambiguity. Both are provided because ~ can cause shell
expansion when it is the first character in an id typed on the command line.
"""
import collections
import collections.abc
@ -101,7 +70,6 @@
import spack.dependency as dp
import spack.error
import spack.hash_types as ht
import spack.parse
import spack.paths
import spack.platforms
import spack.provider_index
@ -125,8 +93,6 @@
__all__ = [
"CompilerSpec",
"Spec",
"SpecParser",
"parse",
"SpecParseError",
"ArchitecturePropagationError",
"DuplicateDependencyError",
@ -584,9 +550,9 @@ def __init__(self, *args):
# If there is one argument, it's either another CompilerSpec
# to copy or a string to parse
if isinstance(arg, str):
c = SpecParser().parse_compiler(arg)
self.name = c.name
self.versions = c.versions
spec = spack.parser.parse_one_or_raise(f"%{arg}")
self.name = spec.compiler.name
self.versions = spec.compiler.versions
elif isinstance(arg, CompilerSpec):
self.name = arg.name
@ -602,7 +568,8 @@ def __init__(self, *args):
name, version = args
self.name = name
self.versions = vn.VersionList()
self.versions.add(vn.ver(version))
versions = vn.ver(version)
self.versions.add(versions)
else:
raise TypeError("__init__ takes 1 or 2 arguments. (%d given)" % nargs)
@ -1285,6 +1252,7 @@ def __init__(
self.external_path = external_path
self.external_module = external_module
"""
import spack.parser
# Copy if spec_like is a Spec.
if isinstance(spec_like, Spec):
@ -1335,11 +1303,7 @@ def __init__(
self._build_spec = None
if isinstance(spec_like, str):
spec_list = SpecParser(self).parse(spec_like)
if len(spec_list) > 1:
raise ValueError("More than one spec in string: " + spec_like)
if len(spec_list) < 1:
raise ValueError("String contains no specs: " + spec_like)
spack.parser.parse_one_or_raise(spec_like, self)
elif spec_like is not None:
raise TypeError("Can't make spec out of %s" % type(spec_like))
@ -4974,421 +4938,6 @@ def __missing__(self, key):
spec_id_re = r"\w[\w.-]*"
class SpecLexer(spack.parse.Lexer):
"""Parses tokens that make up spack specs."""
def __init__(self):
# Spec strings require posix-style paths on Windows
# because the result is later passed to shlex
filename_reg = (
r"[/\w.-]*/[/\w/-]+\.(yaml|json)[^\b]*"
if not is_windows
else r"([A-Za-z]:)*?[/\w.-]*/[/\w/-]+\.(yaml|json)[^\b]*"
)
super(SpecLexer, self).__init__(
[
(
r"\@([\w.\-]*\s*)*(\s*\=\s*\w[\w.\-]*)?",
lambda scanner, val: self.token(VER, val),
),
(r"\:", lambda scanner, val: self.token(COLON, val)),
(r"\,", lambda scanner, val: self.token(COMMA, val)),
(r"\^", lambda scanner, val: self.token(DEP, val)),
(r"\+\+", lambda scanner, val: self.token(D_ON, val)),
(r"\+", lambda scanner, val: self.token(ON, val)),
(r"\-\-", lambda scanner, val: self.token(D_OFF, val)),
(r"\-", lambda scanner, val: self.token(OFF, val)),
(r"\~\~", lambda scanner, val: self.token(D_OFF, val)),
(r"\~", lambda scanner, val: self.token(OFF, val)),
(r"\%", lambda scanner, val: self.token(PCT, val)),
(r"\=\=", lambda scanner, val: self.token(D_EQ, val)),
(r"\=", lambda scanner, val: self.token(EQ, val)),
# Filenames match before identifiers, so no initial filename
# component is parsed as a spec (e.g., in subdir/spec.yaml/json)
(filename_reg, lambda scanner, v: self.token(FILE, v)),
# Hash match after filename. No valid filename can be a hash
# (files end w/.yaml), but a hash can match a filename prefix.
(r"/", lambda scanner, val: self.token(HASH, val)),
# Identifiers match after filenames and hashes.
(spec_id_re, lambda scanner, val: self.token(ID, val)),
(r"\s+", lambda scanner, val: None),
],
[D_EQ, EQ],
[
(r"[\S].*", lambda scanner, val: self.token(VAL, val)),
(r"\s+", lambda scanner, val: None),
],
[VAL],
)
# Lexer is always the same for every parser.
_lexer = SpecLexer()
class SpecParser(spack.parse.Parser):
"""Parses specs."""
__slots__ = "previous", "_initial"
def __init__(self, initial_spec=None):
"""Construct a new SpecParser.
Args:
initial_spec (Spec, optional): provide a Spec that we'll parse
directly into. This is used to avoid construction of a
superfluous Spec object in the Spec constructor.
"""
super(SpecParser, self).__init__(_lexer)
self.previous = None
self._initial = initial_spec
def do_parse(self):
specs = []
try:
while self.next:
# Try a file first, but if it doesn't succeed, keep parsing
# as from_file may backtrack and try an id.
if self.accept(FILE):
spec = self.spec_from_file()
if spec:
specs.append(spec)
continue
if self.accept(ID):
self.previous = self.token
if self.accept(EQ) or self.accept(D_EQ):
# We're parsing an anonymous spec beginning with a
# key-value pair.
if not specs:
self.push_tokens([self.previous, self.token])
self.previous = None
specs.append(self.spec(None))
else:
if specs[-1].concrete:
# Trying to add k-v pair to spec from hash
raise RedundantSpecError(specs[-1], "key-value pair")
# We should never end up here.
# This requires starting a new spec with ID, EQ
# After another spec that is not concrete
# If the previous spec is not concrete, this is
# handled in the spec parsing loop
# If it is concrete, see the if statement above
# If there is no previous spec, we don't land in
# this else case.
self.unexpected_token()
else:
# We're parsing a new spec by name
self.previous = None
specs.append(self.spec(self.token.value))
elif self.accept(HASH):
# We're finding a spec by hash
specs.append(self.spec_by_hash())
elif self.accept(DEP):
if not specs:
# We're parsing an anonymous spec beginning with a
# dependency. Push the token to recover after creating
# anonymous spec
self.push_tokens([self.token])
specs.append(self.spec(None))
else:
dep = None
if self.accept(FILE):
# this may return None, in which case we backtrack
dep = self.spec_from_file()
if not dep and self.accept(HASH):
# We're finding a dependency by hash for an
# anonymous spec
dep = self.spec_by_hash()
dep = dep.copy(deps=("link", "run"))
if not dep:
# We're adding a dependency to the last spec
if self.accept(ID):
self.previous = self.token
if self.accept(EQ):
# This is an anonymous dep with a key=value
# push tokens to be parsed as part of the
# dep spec
self.push_tokens([self.previous, self.token])
dep_name = None
else:
# named dep (standard)
dep_name = self.token.value
self.previous = None
else:
# anonymous dep
dep_name = None
dep = self.spec(dep_name)
# Raise an error if the previous spec is already
# concrete (assigned by hash)
if specs[-1].concrete:
raise RedundantSpecError(specs[-1], "dependency")
# command line deps get empty deptypes now.
# Real deptypes are assigned later per packages.
specs[-1]._add_dependency(dep, ())
else:
# If the next token can be part of a valid anonymous spec,
# create the anonymous spec
if self.next.type in (VER, ON, D_ON, OFF, D_OFF, PCT):
# Raise an error if the previous spec is already
# concrete (assigned by hash)
if specs and specs[-1]._hash:
raise RedundantSpecError(specs[-1], "compiler, version, " "or variant")
specs.append(self.spec(None))
else:
self.unexpected_token()
except spack.parse.ParseError as e:
raise SpecParseError(e) from e
# Generate lookups for git-commit-based versions
for spec in specs:
# Cannot do lookups for versions in anonymous specs
# Only allow Version objects to use git for now
# Note: VersionRange(x, x) is currently concrete, hence isinstance(...).
if spec.name and spec.versions.concrete and isinstance(spec.version, vn.GitVersion):
spec.version.generate_git_lookup(spec.fullname)
return specs
def spec_from_file(self):
"""Read a spec from a filename parsed on the input stream.
There is some care taken here to ensure that filenames are a last
resort, and that any valid package name is parsed as a name
before we consider it as a file. Specs are used in lots of places;
we don't want the parser touching the filesystem unnecessarily.
The parse logic is as follows:
1. We require that filenames end in .yaml, which means that no valid
filename can be interpreted as a hash (hashes can't have '.')
2. We avoid treating paths like /path/to/spec.json as hashes, or paths
like subdir/spec.json as ids by lexing filenames before hashes.
3. For spec names that match file and id regexes, like 'builtin.yaml',
we backtrack from spec_from_file() and treat them as spec names.
"""
path = self.token.value
# Special case where someone omits a space after a filename. Consider:
#
# libdwarf^/some/path/to/libelf.yamllibdwarf ^../../libelf.yaml
#
# The error is clearly an omitted space. To handle this, the FILE
# regex admits text *beyond* .yaml, and we raise a nice error for
# file names that don't end in .yaml.
if not (path.endswith(".yaml") or path.endswith(".json")):
raise SpecFilenameError("Spec filename must end in .yaml or .json: '{0}'".format(path))
if not os.path.exists(path):
raise NoSuchSpecFileError("No such spec file: '{0}'".format(path))
with open(path) as f:
if path.endswith(".json"):
return Spec.from_json(f)
return Spec.from_yaml(f)
def parse_compiler(self, text):
self.setup(text)
return self.compiler()
def spec_by_hash(self):
# TODO: Remove parser dependency on active environment and database.
import spack.environment
self.expect(ID)
dag_hash = self.token.value
matches = []
if spack.environment.active_environment():
matches = spack.environment.active_environment().get_by_hash(dag_hash)
if not matches:
matches = spack.store.db.get_by_hash(dag_hash)
if not matches:
raise NoSuchHashError(dag_hash)
if len(matches) != 1:
raise AmbiguousHashError(
"Multiple packages specify hash beginning '%s'." % dag_hash, *matches
)
return matches[0]
def spec(self, name):
"""Parse a spec out of the input. If a spec is supplied, initialize
and return it instead of creating a new one."""
spec_namespace = None
spec_name = None
if name:
spec_namespace, dot, spec_name = name.rpartition(".")
if not spec_namespace:
spec_namespace = None
self.check_identifier(spec_name)
if self._initial is None:
spec = Spec()
else:
# this is used by Spec.__init__
spec = self._initial
self._initial = None
spec.namespace = spec_namespace
spec.name = spec_name
while self.next:
if self.accept(VER):
vlist = self.version_list()
spec._add_versions(vlist)
elif self.accept(D_ON):
name = self.variant()
spec.variants[name] = vt.BoolValuedVariant(name, True, propagate=True)
elif self.accept(ON):
name = self.variant()
spec.variants[name] = vt.BoolValuedVariant(name, True, propagate=False)
elif self.accept(D_OFF):
name = self.variant()
spec.variants[name] = vt.BoolValuedVariant(name, False, propagate=True)
elif self.accept(OFF):
name = self.variant()
spec.variants[name] = vt.BoolValuedVariant(name, False, propagate=False)
elif self.accept(PCT):
spec._set_compiler(self.compiler())
elif self.accept(ID):
self.previous = self.token
if self.accept(D_EQ):
# We're adding a key-value pair to the spec
self.expect(VAL)
spec._add_flag(self.previous.value, self.token.value, propagate=True)
self.previous = None
elif self.accept(EQ):
# We're adding a key-value pair to the spec
self.expect(VAL)
spec._add_flag(self.previous.value, self.token.value, propagate=False)
self.previous = None
else:
# We've found the start of a new spec. Go back to do_parse
# and read this token again.
self.push_tokens([self.token])
self.previous = None
break
elif self.accept(HASH):
# Get spec by hash and confirm it matches any constraints we
# already read in
hash_spec = self.spec_by_hash()
if hash_spec.satisfies(spec):
spec._dup(hash_spec)
break
else:
raise InvalidHashError(spec, hash_spec.dag_hash())
else:
break
return spec
def variant(self, name=None):
if name:
return name
else:
self.expect(ID)
self.check_identifier()
return self.token.value
def version(self):
start = None
end = None
def str_translate(value):
# return None for empty strings since we can end up with `'@'.strip('@')`
if not (value and value.strip()):
return None
else:
return value
if self.token.type is COMMA:
# need to increment commas, could be ID or COLON
self.accept(ID)
if self.token.type in (VER, ID):
version_spec = self.token.value.lstrip("@")
start = str_translate(version_spec)
if self.accept(COLON):
if self.accept(ID):
if self.next and self.next.type is EQ:
# This is a start: range followed by a key=value pair
self.push_tokens([self.token])
else:
end = self.token.value
elif start:
# No colon, but there was a version
return vn.Version(start)
else:
# No colon and no id: invalid version
self.next_token_error("Invalid version specifier")
if start:
start = vn.Version(start)
if end:
end = vn.Version(end)
return vn.VersionRange(start, end)
def version_list(self):
vlist = []
vlist.append(self.version())
while self.accept(COMMA):
vlist.append(self.version())
return vlist
def compiler(self):
self.expect(ID)
self.check_identifier()
compiler = CompilerSpec.__new__(CompilerSpec)
compiler.name = self.token.value
compiler.versions = vn.VersionList()
if self.accept(VER):
vlist = self.version_list()
compiler._add_versions(vlist)
else:
compiler.versions = vn.VersionList(":")
return compiler
def check_identifier(self, id=None):
"""The only identifiers that can contain '.' are versions, but version
ids are context-sensitive so we have to check on a case-by-case
basis. Call this if we detect a version id where it shouldn't be.
"""
if not id:
id = self.token.value
if "." in id:
self.last_token_error("{0}: Identifier cannot contain '.'".format(id))
def parse(string):
"""Returns a list of specs from an input string.
For creating one spec, see Spec() constructor.
"""
return SpecParser().parse(string)
def save_dependency_specfiles(
root_spec_info, output_directory, dependencies=None, spec_format="json"
):

View File

@ -26,6 +26,7 @@
import spack.util.executable
from spack.error import SpackError
from spack.main import SpackCommand
from spack.parser import SpecSyntaxError
from spack.spec import CompilerSpec, Spec
install = SpackCommand("install")
@ -362,7 +363,7 @@ def test_install_conflicts(conflict_spec):
)
def test_install_invalid_spec(invalid_spec):
# Make sure that invalid specs raise a SpackError
with pytest.raises(SpackError, match="Unexpected token"):
with pytest.raises(SpecSyntaxError, match="unexpected tokens"):
install(invalid_spec)

View File

@ -10,6 +10,7 @@
import spack.environment as ev
import spack.error
import spack.parser
import spack.spec
import spack.store
from spack.main import SpackCommand, SpackCommandError
@ -181,13 +182,11 @@ def test_spec_returncode():
def test_spec_parse_error():
with pytest.raises(spack.error.SpackError) as e:
with pytest.raises(spack.parser.SpecSyntaxError) as e:
spec("1.15:")
# make sure the error is formatted properly
error_msg = """\
1.15:
^"""
error_msg = "unexpected tokens in the spec string\n1.15:\n ^"
assert error_msg in str(e.value)

View File

@ -68,22 +68,18 @@ def test_validate_spec(validate_spec_schema):
# Check that invalid data throws
data["^python@3.7@"] = "baz"
with pytest.raises(jsonschema.ValidationError) as exc_err:
with pytest.raises(jsonschema.ValidationError, match="unexpected tokens"):
v.validate(data)
assert "is an invalid spec" in str(exc_err.value)
@pytest.mark.regression("9857")
def test_module_suffixes(module_suffixes_schema):
v = spack.schema.Validator(module_suffixes_schema)
data = {"tcl": {"all": {"suffixes": {"^python@2.7@": "py2.7"}}}}
with pytest.raises(jsonschema.ValidationError) as exc_err:
with pytest.raises(jsonschema.ValidationError, match="unexpected tokens"):
v.validate(data)
assert "is an invalid spec" in str(exc_err.value)
@pytest.mark.regression("10246")
@pytest.mark.parametrize(

View File

@ -9,6 +9,7 @@
import spack.error
import spack.package_base
import spack.parser
import spack.repo
import spack.util.hash as hashutil
from spack.dependency import Dependency, all_deptypes, canonical_deptype
@ -961,7 +962,7 @@ def test_canonical_deptype(self):
def test_invalid_literal_spec(self):
# Can't give type 'build' to a top-level spec
with pytest.raises(spack.spec.SpecParseError):
with pytest.raises(spack.parser.SpecSyntaxError):
Spec.from_literal({"foo:build": None})
# Can't use more than one ':' separator

View File

@ -707,13 +707,9 @@ def test_constrain_dependency_not_changed(self):
)
def test_exceptional_paths_for_constructor(self):
with pytest.raises(TypeError):
Spec((1, 2))
with pytest.raises(ValueError):
Spec("")
with pytest.raises(ValueError):
Spec("libelf foo")

File diff suppressed because it is too large Load Diff

View File

@ -937,7 +937,7 @@ def __init__(self, vlist=None):
self.versions = []
if vlist is not None:
if isinstance(vlist, str):
vlist = _string_to_version(vlist)
vlist = from_string(vlist)
if type(vlist) == VersionList:
self.versions = vlist.versions
else:
@ -1165,7 +1165,7 @@ def __repr__(self):
return str(self.versions)
def _string_to_version(string):
def from_string(string):
"""Converts a string to a Version, VersionList, or VersionRange.
This is private. Client code should use ver().
"""
@ -1191,9 +1191,9 @@ def ver(obj):
if isinstance(obj, (list, tuple)):
return VersionList(obj)
elif isinstance(obj, str):
return _string_to_version(obj)
return from_string(obj)
elif isinstance(obj, (int, float)):
return _string_to_version(str(obj))
return from_string(str(obj))
elif type(obj) in (VersionBase, GitVersion, VersionRange, VersionList):
return obj
else: