parser: refactor with coarser token granularity (#34151)

## Motivation Our parser grew to be quite complex, with a 2-state lexer and logic in the parser that has up to 5 levels of nested conditionals. In the future, to turn compilers into proper dependencies, we'll have to increase the complexity further as we foresee the need to add: 1. Edge attributes 2. Spec nesting to the spec syntax (see https://github.com/spack/seps/pull/5 for an initial discussion of those changes). The main attempt here is thus to _simplify the existing code_ before we start extending it later. We try to do that by adopting a different token granularity, and by using more complex regexes for tokenization. This allow us to a have a "flatter" encoding for the parser. i.e., it has fewer nested conditionals and a near-trivial lexer. There are places, namely in `VERSION`, where we have to use negative lookahead judiciously to avoid ambiguity. Specifically, this parse is ambiguous without `(?!\s*=)` in `VERSION_RANGE` and an extra final `\b` in `VERSION`: ``` @ 1.2.3 : develop # This is a version range 1.2.3:develop @ 1.2.3 : develop=foo # This is a version range 1.2.3: followed by a key-value pair ``` ## Differences with the previous parser ~There are currently 2 known differences with the previous parser, which have been added on purpose:~ - ~No spaces allowed after a sigil (e.g. `foo @ 1.2.3` is invalid while `foo @1.2.3` is valid)~ - ~`/<hash> @1.2.3` can be parsed as a concrete spec followed by an anonymous spec (before was invalid)~ ~We can recover the previous behavior on both ones but, especially for the second one, it seems the current behavior in the PR is more consistent.~ The parser is currently 100% backward compatible. ## Error handling Being based on more complex regexes, we can possibly improve error handling by adding regexes for common issues and hint users on that. I'll leave that for a following PR, but there's a stub for this approach in the PR. ## Performance To be sure we don't add any performance penalty with this new encoding, I measured: ```console $ spack python -m timeit -s "import spack.spec" -c "spack.spec.Spec(<spec>)" ``` for different specs on my machine: * **Spack:** 0.20.0.dev0 (c9db4e50ba045f5697816187accaf2451cb1aae7) * **Python:** 3.8.10 * **Platform:** linux-ubuntu20.04-icelake * **Concretizer:** clingo results are: | Spec | develop | this PR | | ------------- | ------------- | ------- | | `trilinos` | 28.9 usec | 13.1 usec | | `trilinos @1.2.10:1.4.20,2.0.1` | 131 usec | 120 usec | | `trilinos %gcc` | 44.9 usec | 20.9 usec | | `trilinos +foo` | 44.1 usec | 21.3 usec | | `trilinos foo=bar` | 59.5 usec | 25.6 usec | | `trilinos foo=bar ^ mpich foo=baz` | 120 usec | 82.1 usec | so this new parser seems to be consistently faster than the previous one. ## Modifications In this PR we just substituted the Spec parser, which means: - [x] Deleted in `spec.py` the `SpecParser` and `SpecLexer` classes. deleted `spack/parse.py` - [x] Added a new parser in `spack/parser.py` - [x] Hooked the new parser in all the places the previous one was used - [x] Adapted unit tests in `test/spec_syntax.py` ## Possible future improvements Random thoughts while working on the PR: - Currently we transform hashes and files into specs during parsing. I think we might want to introduce an additional step and parse special objects like a `FileSpec` etc. in-between parsing and concretization.
2022-12-07 23:56:53 +01:00
parent 412bec45aa
commit ab6499ce1e
14 changed files with 1514 additions and 1575 deletions
--- a/lib/spack/docs/developer_guide.rst
+++ b/lib/spack/docs/developer_guide.rst
@@ -175,14 +175,11 @@ Spec-related modules
 ^^^^^^^^^^^^^^^^^^^^

 :mod:`spack.spec`
-  Contains :class:`~spack.spec.Spec` and :class:`~spack.spec.SpecParser`.
-  Also implements most of the logic for normalization and concretization
+  Contains :class:`~spack.spec.Spec`. Also implements most of the logic for concretization
  of specs.

-:mod:`spack.parse`
-  Contains some base classes for implementing simple recursive descent
-  parsers: :class:`~spack.parse.Parser` and :class:`~spack.parse.Lexer`.
-  Used by :class:`~spack.spec.SpecParser`.
+:mod:`spack.parser`
+  Contains :class:`~spack.parser.SpecParser` and functions related to parsing specs.

 :mod:`spack.concretize`
  Contains :class:`~spack.concretize.Concretizer` implementation,
--- a/lib/spack/spack/cmd/init.py
+++ b/lib/spack/spack/cmd/init.py
@@ -26,6 +26,7 @@
 import spack.environment as ev
 import spack.error
 import spack.extensions
+import spack.parser
 import spack.paths
 import spack.spec
 import spack.store
@@ -217,7 +218,7 @@ def parse_specs(args, **kwargs):
    unquoted_flags = _UnquotedFlags.extract(sargs)

    try:
-        specs = spack.spec.parse(sargs)
+        specs = spack.parser.parse(sargs)
        for spec in specs:
            if concretize:
                spec.concretize(tests=tests)  # implies normalize
--- a/lib/spack/spack/directives.py
+++ b/lib/spack/spack/directives.py
@@ -495,6 +495,8 @@ def provides(*specs, **kwargs):
    """

    def _execute_provides(pkg):
+        import spack.parser  # Avoid circular dependency
+
        when = kwargs.get("when")
        when_spec = make_when_spec(when)
        if not when_spec:
@@ -505,7 +507,7 @@ def _execute_provides(pkg):
        when_spec.name = pkg.name

        for string in specs:
-            for provided_spec in spack.spec.parse(string):
+            for provided_spec in spack.parser.parse(string):
                if pkg.name == provided_spec.name:
                    raise CircularReferenceError("Package '%s' cannot provide itself." % pkg.name)

--- a/lib/spack/spack/parse.py
+++ b/lib/spack/spack/parse.py
@@ -1,174 +0,0 @@
-# Copyright 2013-2022 Lawrence Livermore National Security, LLC and other
-# Spack Project Developers. See the top-level COPYRIGHT file for details.
-#
-# SPDX-License-Identifier: (Apache-2.0 OR MIT)
-
-import itertools
-import re
-import shlex
-import sys
-
-import spack.error
-import spack.util.path as sp
-
-
-class Token(object):
-    """Represents tokens; generated from input by lexer and fed to parse()."""
-
-    __slots__ = "type", "value", "start", "end"
-
-    def __init__(self, type, value="", start=0, end=0):
-        self.type = type
-        self.value = value
-        self.start = start
-        self.end = end
-
-    def __repr__(self):
-        return str(self)
-
-    def __str__(self):
-        return "<%d: '%s'>" % (self.type, self.value)
-
-    def is_a(self, type):
-        return self.type == type
-
-    def __eq__(self, other):
-        return (self.type == other.type) and (self.value == other.value)
-
-
-class Lexer(object):
-    """Base class for Lexers that keep track of line numbers."""
-
-    __slots__ = "scanner0", "scanner1", "mode", "mode_switches_01", "mode_switches_10"
-
-    def __init__(self, lexicon0, mode_switches_01=[], lexicon1=[], mode_switches_10=[]):
-        self.scanner0 = re.Scanner(lexicon0)
-        self.mode_switches_01 = mode_switches_01
-        self.scanner1 = re.Scanner(lexicon1)
-        self.mode_switches_10 = mode_switches_10
-        self.mode = 0
-
-    def token(self, type, value=""):
-        if self.mode == 0:
-            return Token(type, value, self.scanner0.match.start(0), self.scanner0.match.end(0))
-        else:
-            return Token(type, value, self.scanner1.match.start(0), self.scanner1.match.end(0))
-
-    def lex_word(self, word):
-        scanner = self.scanner0
-        mode_switches = self.mode_switches_01
-        if self.mode == 1:
-            scanner = self.scanner1
-            mode_switches = self.mode_switches_10
-
-        tokens, remainder = scanner.scan(word)
-        remainder_used = 0
-
-        for i, t in enumerate(tokens):
-            if t.type in mode_switches:
-                # Combine post-switch tokens with remainder and
-                # scan in other mode
-                self.mode = 1 - self.mode  # swap 0/1
-                remainder_used = 1
-                tokens = tokens[: i + 1] + self.lex_word(
-                    word[word.index(t.value) + len(t.value) :]
-                )
-                break
-
-        if remainder and not remainder_used:
-            msg = "Invalid character, '{0}',".format(remainder[0])
-            msg += " in '{0}' at index {1}".format(word, word.index(remainder))
-            raise LexError(msg, word, word.index(remainder))
-
-        return tokens
-
-    def lex(self, text):
-        lexed = []
-        for word in text:
-            tokens = self.lex_word(word)
-            lexed.extend(tokens)
-        return lexed
-
-
-class Parser(object):
-    """Base class for simple recursive descent parsers."""
-
-    __slots__ = "tokens", "token", "next", "lexer", "text"
-
-    def __init__(self, lexer):
-        self.tokens = iter([])  # iterators over tokens, handled in order.
-        self.token = Token(None)  # last accepted token
-        self.next = None  # next token
-        self.lexer = lexer
-        self.text = None
-
-    def gettok(self):
-        """Puts the next token in the input stream into self.next."""
-        try:
-            self.next = next(self.tokens)
-        except StopIteration:
-            self.next = None
-
-    def push_tokens(self, iterable):
-        """Adds all tokens in some iterable to the token stream."""
-        self.tokens = itertools.chain(iter(iterable), iter([self.next]), self.tokens)
-        self.gettok()
-
-    def accept(self, id):
-        """Put the next symbol in self.token if accepted, then call gettok()"""
-        if self.next and self.next.is_a(id):
-            self.token = self.next
-            self.gettok()
-            return True
-        return False
-
-    def next_token_error(self, message):
-        """Raise an error about the next token in the stream."""
-        raise ParseError(message, self.text[0], self.token.end)
-
-    def last_token_error(self, message):
-        """Raise an error about the previous token in the stream."""
-        raise ParseError(message, self.text[0], self.token.start)
-
-    def unexpected_token(self):
-        self.next_token_error("Unexpected token: '%s'" % self.next.value)
-
-    def expect(self, id):
-        """Like accept(), but fails if we don't like the next token."""
-        if self.accept(id):
-            return True
-        else:
-            if self.next:
-                self.unexpected_token()
-            else:
-                self.next_token_error("Unexpected end of input")
-            sys.exit(1)
-
-    def setup(self, text):
-        if isinstance(text, str):
-            # shlex does not handle Windows path
-            # separators, so we must normalize to posix
-            text = sp.convert_to_posix_path(text)
-            text = shlex.split(str(text))
-        self.text = text
-        self.push_tokens(self.lexer.lex(text))
-
-    def parse(self, text):
-        self.setup(text)
-        return self.do_parse()
-
-
-class ParseError(spack.error.SpackError):
-    """Raised when we don't hit an error while parsing."""
-
-    def __init__(self, message, string, pos):
-        super(ParseError, self).__init__(message)
-        self.string = string
-        self.pos = pos
-
-
-class LexError(ParseError):
-    """Raised when we don't know how to lex something."""
-
-    def __init__(self, message, string, pos):
-        super(LexError, self).__init__(message, string, pos)
--- a/lib/spack/spack/parser.py
+++ b/lib/spack/spack/parser.py
@@ -0,0 +1,522 @@
+# Copyright 2013-2022 Lawrence Livermore National Security, LLC and other
+# Spack Project Developers. See the top-level COPYRIGHT file for details.
+#
+# SPDX-License-Identifier: (Apache-2.0 OR MIT)
+"""Parser for spec literals
+
+Here is the EBNF grammar for a spec::
+
+    spec          = [name] [node_options] { ^ node } |
+                    [name] [node_options] hash |
+                    filename
+
+    node          =  name [node_options] |
+                     [name] [node_options] hash |
+                     filename
+
+    node_options  = [@(version_list|version_pair)] [%compiler] { variant }
+
+    hash          = / id
+    filename      = (.|/|[a-zA-Z0-9-_]*/)([a-zA-Z0-9-_./]*)(.json|.yaml)
+
+    name          = id | namespace id
+    namespace     = { id . }
+
+    variant       = bool_variant | key_value | propagated_bv | propagated_kv
+    bool_variant  =  +id |  ~id |  -id
+    propagated_bv = ++id | ~~id | --id
+    key_value     =  id=id |  id=quoted_id
+    propagated_kv = id==id | id==quoted_id
+
+    compiler      = id [@version_list]
+
+    version_pair  = git_version=vid
+    version_list  = (version|version_range) [ { , (version|version_range)} ]
+    version_range = vid:vid | vid: | :vid | :
+    version       = vid
+
+    git_version   = git.(vid) | git_hash
+    git_hash      = [A-Fa-f0-9]{40}
+
+    quoted_id     = " id_with_ws " | ' id_with_ws '
+    id_with_ws    = [a-zA-Z0-9_][a-zA-Z_0-9-.\\s]*
+    vid           = [a-zA-Z0-9_][a-zA-Z_0-9-.]*
+    id            = [a-zA-Z0-9_][a-zA-Z_0-9-]*
+
+Identifiers using the <name>=<value> command, such as architectures and
+compiler flags, require a space before the name.
+
+There is one context-sensitive part: ids in versions may contain '.', while
+other ids may not.
+
+There is one ambiguity: since '-' is allowed in an id, you need to put
+whitespace space before -variant for it to be tokenized properly.  You can
+either use whitespace, or you can just use ~variant since it means the same
+thing.  Spack uses ~variant in directory names and in the canonical form of
+specs to avoid ambiguity.  Both are provided because ~ can cause shell
+expansion when it is the first character in an id typed on the command line.
+"""
+import enum
+import pathlib
+import re
+from typing import Iterator, List, Match, Optional
+
+from llnl.util.tty import color
+
+import spack.error
+import spack.spec
+import spack.variant
+import spack.version
+
+#: Valid name for specs and variants. Here we are not using
+#: the previous "w[\w.-]*" since that would match most
+#: characters that can be part of a word in any language
+IDENTIFIER = r"([a-zA-Z_0-9][a-zA-Z_0-9\-]*)"
+DOTTED_IDENTIFIER = rf"({IDENTIFIER}(\.{IDENTIFIER})+)"
+GIT_HASH = r"([A-Fa-f0-9]{40})"
+GIT_VERSION = rf"((git\.({DOTTED_IDENTIFIER}|{IDENTIFIER}))|({GIT_HASH}))"
+
+NAME = r"[a-zA-Z_0-9][a-zA-Z_0-9\-.]*"
+
+HASH = r"[a-zA-Z_0-9]+"
+
+#: A filename starts either with a "." or a "/" or a "{name}/"
+FILENAME = r"(\.|\/|[a-zA-Z0-9-_]*\/)([a-zA-Z0-9-_\.\/]*)(\.json|\.yaml)"
+
+VALUE = r"([a-zA-Z_0-9\-+\*.,:=\~\/\\]+)"
+QUOTED_VALUE = r"[\"']+([a-zA-Z_0-9\-+\*.,:=\~\/\\\s]+)[\"']+"
+
+VERSION = r"([a-zA-Z0-9_][a-zA-Z_0-9\-\.]*\b)"
+VERSION_RANGE = rf"({VERSION}\s*:\s*{VERSION}(?!\s*=)|:\s*{VERSION}(?!\s*=)|{VERSION}\s*:|:)"
+VERSION_LIST = rf"({VERSION_RANGE}|{VERSION})(\s*[,]\s*({VERSION_RANGE}|{VERSION}))*"
+
+
+class TokenBase(enum.Enum):
+    """Base class for an enum type with a regex value"""
+
+    def __new__(cls, *args, **kwargs):
+        # See
+        value = len(cls.__members__) + 1
+        obj = object.__new__(cls)
+        obj._value_ = value
+        return obj
+
+    def __init__(self, regex):
+        self.regex = regex
+
+    def __str__(self):
+        return f"{self._name_}"
+
+
+class TokenType(TokenBase):
+    """Enumeration of the different token kinds in the spec grammar.
+
+    Order of declaration is extremely important, since text containing specs is parsed with a
+    single regex obtained by ``"|".join(...)`` of all the regex in the order of declaration.
+    """
+
+    # Dependency
+    DEPENDENCY = r"(\^)"
+    # Version
+    VERSION_HASH_PAIR = rf"(@({GIT_VERSION})=({VERSION}))"
+    VERSION = rf"(@\s*({VERSION_LIST}))"
+    # Variants
+    PROPAGATED_BOOL_VARIANT = rf"((\+\+|~~|--)\s*{NAME})"
+    BOOL_VARIANT = rf"([~+-]\s*{NAME})"
+    PROPAGATED_KEY_VALUE_PAIR = rf"({NAME}\s*==\s*({VALUE}|{QUOTED_VALUE}))"
+    KEY_VALUE_PAIR = rf"({NAME}\s*=\s*({VALUE}|{QUOTED_VALUE}))"
+    # Compilers
+    COMPILER_AND_VERSION = rf"(%\s*({NAME})([\s]*)@\s*({VERSION_LIST}))"
+    COMPILER = rf"(%\s*({NAME}))"
+    # FILENAME
+    FILENAME = rf"({FILENAME})"
+    # Package name
+    FULLY_QUALIFIED_PACKAGE_NAME = rf"({DOTTED_IDENTIFIER})"
+    UNQUALIFIED_PACKAGE_NAME = rf"({IDENTIFIER})"
+    # DAG hash
+    DAG_HASH = rf"(/({HASH}))"
+    # White spaces
+    WS = r"(\s+)"
+
+
+class ErrorTokenType(TokenBase):
+    """Enum with regexes for error analysis"""
+
+    # Unexpected character
+    UNEXPECTED = r"(.[\s]*)"
+
+
+class Token:
+    """Represents tokens; generated from input by lexer and fed to parse()."""
+
+    __slots__ = "kind", "value", "start", "end"
+
+    def __init__(
+        self, kind: TokenType, value: str, start: Optional[int] = None, end: Optional[int] = None
+    ):
+        self.kind = kind
+        self.value = value
+        self.start = start
+        self.end = end
+
+    def __repr__(self):
+        return str(self)
+
+    def __str__(self):
+        return f"({self.kind}, {self.value})"
+
+    def __eq__(self, other):
+        return (self.kind == other.kind) and (self.value == other.value)
+
+
+#: List of all the regexes used to match spec parts, in order of precedence
+TOKEN_REGEXES = [rf"(?P<{token}>{token.regex})" for token in TokenType]
+#: List of all valid regexes followed by error analysis regexes
+ERROR_HANDLING_REGEXES = TOKEN_REGEXES + [
+    rf"(?P<{token}>{token.regex})" for token in ErrorTokenType
+]
+#: Regex to scan a valid text
+ALL_TOKENS = re.compile("|".join(TOKEN_REGEXES))
+#: Regex to analyze an invalid text
+ANALYSIS_REGEX = re.compile("|".join(ERROR_HANDLING_REGEXES))
+
+
+def tokenize(text: str) -> Iterator[Token]:
+    """Return a token generator from the text passed as input.
+
+    Raises:
+        SpecTokenizationError: if we can't tokenize anymore, but didn't reach the
+            end of the input text.
+    """
+    scanner = ALL_TOKENS.scanner(text)  # type: ignore[attr-defined]
+    match: Optional[Match] = None
+    for match in iter(scanner.match, None):
+        yield Token(
+            TokenType.__members__[match.lastgroup],  # type: ignore[attr-defined]
+            match.group(),  # type: ignore[attr-defined]
+            match.start(),  # type: ignore[attr-defined]
+            match.end(),  # type: ignore[attr-defined]
+        )
+
+    if match is None and not text:
+        # We just got an empty string
+        return
+
+    if match is None or match.end() != len(text):
+        scanner = ANALYSIS_REGEX.scanner(text)  # type: ignore[attr-defined]
+        matches = [m for m in iter(scanner.match, None)]  # type: ignore[var-annotated]
+        raise SpecTokenizationError(matches, text)
+
+
+class TokenContext:
+    """Token context passed around by parsers"""
+
+    __slots__ = "token_stream", "current_token", "next_token"
+
+    def __init__(self, token_stream: Iterator[Token]):
+        self.token_stream = token_stream
+        self.current_token = None
+        self.next_token = None
+        self.advance()
+
+    def advance(self):
+        """Advance one token"""
+        self.current_token, self.next_token = self.next_token, next(self.token_stream, None)
+
+    def accept(self, kind: TokenType):
+        """If the next token is of the specified kind, advance the stream and return True.
+        Otherwise return False.
+        """
+        if self.next_token and self.next_token.kind == kind:
+            self.advance()
+            return True
+        return False
+
+
+class SpecParser:
+    """Parse text into specs"""
+
+    __slots__ = "literal_str", "ctx"
+
+    def __init__(self, literal_str: str):
+        self.literal_str = literal_str
+        self.ctx = TokenContext(filter(lambda x: x.kind != TokenType.WS, tokenize(literal_str)))
+
+    def tokens(self) -> List[Token]:
+        """Return the entire list of token from the initial text. White spaces are
+        filtered out.
+        """
+        return list(filter(lambda x: x.kind != TokenType.WS, tokenize(self.literal_str)))
+
+    def next_spec(self, initial_spec: Optional[spack.spec.Spec] = None) -> spack.spec.Spec:
+        """Return the next spec parsed from text.
+
+        Args:
+            initial_spec: object where to parse the spec. If None a new one
+                will be created.
+
+        Return
+            The spec that was parsed
+        """
+        initial_spec = initial_spec or spack.spec.Spec()
+        root_spec = SpecNodeParser(self.ctx).parse(initial_spec)
+        while True:
+            if self.ctx.accept(TokenType.DEPENDENCY):
+                dependency = SpecNodeParser(self.ctx).parse(spack.spec.Spec())
+
+                if dependency == spack.spec.Spec():
+                    msg = (
+                        "this dependency sigil needs to be followed by a package name "
+                        "or a node attribute (version, variant, etc.)"
+                    )
+                    raise SpecParsingError(msg, self.ctx.current_token, self.literal_str)
+
+                if root_spec.concrete:
+                    raise spack.spec.RedundantSpecError(root_spec, "^" + str(dependency))
+
+                root_spec._add_dependency(dependency, ())
+
+            else:
+                break
+
+        return root_spec
+
+    def all_specs(self) -> List[spack.spec.Spec]:
+        """Return all the specs that remain to be parsed"""
+        return list(iter(self.next_spec, spack.spec.Spec()))
+
+
+class SpecNodeParser:
+    """Parse a single spec node from a stream of tokens"""
+
+    __slots__ = "ctx", "has_compiler", "has_version", "has_hash"
+
+    def __init__(self, ctx):
+        self.ctx = ctx
+        self.has_compiler = False
+        self.has_version = False
+        self.has_hash = False
+
+    def parse(self, initial_spec: spack.spec.Spec) -> spack.spec.Spec:
+        """Parse a single spec node from a stream of tokens
+
+        Args:
+            initial_spec: object to be constructed
+
+        Return
+            The object passed as argument
+        """
+        import spack.environment  # Needed to retrieve by hash
+
+        # If we start with a package name we have a named spec, we cannot
+        # accept another package name afterwards in a node
+        if self.ctx.accept(TokenType.UNQUALIFIED_PACKAGE_NAME):
+            initial_spec.name = self.ctx.current_token.value
+        elif self.ctx.accept(TokenType.FULLY_QUALIFIED_PACKAGE_NAME):
+            parts = self.ctx.current_token.value.split(".")
+            name = parts[-1]
+            namespace = ".".join(parts[:-1])
+            initial_spec.name = name
+            initial_spec.namespace = namespace
+        elif self.ctx.accept(TokenType.FILENAME):
+            return FileParser(self.ctx).parse(initial_spec)
+
+        while True:
+            if self.ctx.accept(TokenType.COMPILER):
+                self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
+                if self.has_compiler:
+                    raise spack.spec.DuplicateCompilerSpecError(
+                        f"{initial_spec} cannot have multiple compilers"
+                    )
+
+                compiler_name = self.ctx.current_token.value[1:]
+                initial_spec.compiler = spack.spec.CompilerSpec(compiler_name.strip(), ":")
+                self.has_compiler = True
+            elif self.ctx.accept(TokenType.COMPILER_AND_VERSION):
+                self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
+                if self.has_compiler:
+                    raise spack.spec.DuplicateCompilerSpecError(
+                        f"{initial_spec} cannot have multiple compilers"
+                    )
+
+                compiler_name, compiler_version = self.ctx.current_token.value[1:].split("@")
+                initial_spec.compiler = spack.spec.CompilerSpec(
+                    compiler_name.strip(), compiler_version
+                )
+                self.has_compiler = True
+            elif self.ctx.accept(TokenType.VERSION) or self.ctx.accept(
+                TokenType.VERSION_HASH_PAIR
+            ):
+                self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
+                if self.has_version:
+                    raise spack.spec.MultipleVersionError(
+                        f"{initial_spec} cannot have multiple versions"
+                    )
+
+                version_list = spack.version.VersionList()
+                version_list.add(spack.version.from_string(self.ctx.current_token.value[1:]))
+                initial_spec.versions = version_list
+
+                # Add a git lookup method for GitVersions
+                if (
+                    initial_spec.name
+                    and initial_spec.versions.concrete
+                    and isinstance(initial_spec.version, spack.version.GitVersion)
+                ):
+                    initial_spec.version.generate_git_lookup(initial_spec.fullname)
+
+                self.has_version = True
+            elif self.ctx.accept(TokenType.BOOL_VARIANT):
+                self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
+                variant_value = self.ctx.current_token.value[0] == "+"
+                initial_spec._add_flag(
+                    self.ctx.current_token.value[1:].strip(), variant_value, propagate=False
+                )
+            elif self.ctx.accept(TokenType.PROPAGATED_BOOL_VARIANT):
+                self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
+                variant_value = self.ctx.current_token.value[0:2] == "++"
+                initial_spec._add_flag(
+                    self.ctx.current_token.value[2:].strip(), variant_value, propagate=True
+                )
+            elif self.ctx.accept(TokenType.KEY_VALUE_PAIR):
+                self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
+                name, value = self.ctx.current_token.value.split("=", maxsplit=1)
+                name = name.strip("'\" ")
+                value = value.strip("'\" ")
+                initial_spec._add_flag(name, value, propagate=False)
+            elif self.ctx.accept(TokenType.PROPAGATED_KEY_VALUE_PAIR):
+                self.hash_not_parsed_or_raise(initial_spec, self.ctx.current_token.value)
+                name, value = self.ctx.current_token.value.split("==", maxsplit=1)
+                name = name.strip("'\" ")
+                value = value.strip("'\" ")
+                initial_spec._add_flag(name, value, propagate=True)
+            elif not self.has_hash and self.ctx.accept(TokenType.DAG_HASH):
+                dag_hash = self.ctx.current_token.value[1:]
+                matches = []
+                if spack.environment.active_environment():
+                    matches = spack.environment.active_environment().get_by_hash(dag_hash)
+                if not matches:
+                    matches = spack.store.db.get_by_hash(dag_hash)
+                if not matches:
+                    raise spack.spec.NoSuchHashError(dag_hash)
+
+                if len(matches) != 1:
+                    raise spack.spec.AmbiguousHashError(
+                        f"Multiple packages specify hash beginning '{dag_hash}'.", *matches
+                    )
+                spec_by_hash = matches[0]
+                if not spec_by_hash.satisfies(initial_spec):
+                    raise spack.spec.InvalidHashError(initial_spec, spec_by_hash.dag_hash())
+                initial_spec._dup(spec_by_hash)
+
+                self.has_hash = True
+            else:
+                break
+
+        return initial_spec
+
+    def hash_not_parsed_or_raise(self, spec, addition):
+        if not self.has_hash:
+            return
+
+        raise spack.spec.RedundantSpecError(spec, addition)
+
+
+class FileParser:
+    """Parse a single spec from a JSON or YAML file"""
+
+    __slots__ = ("ctx",)
+
+    def __init__(self, ctx):
+        self.ctx = ctx
+
+    def parse(self, initial_spec: spack.spec.Spec) -> spack.spec.Spec:
+        """Parse a spec tree from a specfile.
+
+        Args:
+            initial_spec: object where to parse the spec
+
+        Return
+            The initial_spec passed as argument, once constructed
+        """
+        file = pathlib.Path(self.ctx.current_token.value)
+
+        if not file.exists():
+            raise spack.spec.NoSuchSpecFileError(f"No such spec file: '{file}'")
+
+        with file.open("r", encoding="utf-8") as stream:
+            if str(file).endswith(".json"):
+                spec_from_file = spack.spec.Spec.from_json(stream)
+            else:
+                spec_from_file = spack.spec.Spec.from_yaml(stream)
+        initial_spec._dup(spec_from_file)
+        return initial_spec
+
+
+def parse(text: str) -> List[spack.spec.Spec]:
+    """Parse text into a list of strings
+
+    Args:
+        text (str): text to be parsed
+
+    Return:
+        List of specs
+    """
+    return SpecParser(text).all_specs()
+
+
+def parse_one_or_raise(
+    text: str, initial_spec: Optional[spack.spec.Spec] = None
+) -> spack.spec.Spec:
+    """Parse exactly one spec from text and return it, or raise
+
+    Args:
+        text (str): text to be parsed
+        initial_spec: buffer where to parse the spec. If None a new one will be created.
+    """
+    stripped_text = text.strip()
+    parser = SpecParser(stripped_text)
+    result = parser.next_spec(initial_spec)
+    last_token = parser.ctx.current_token
+
+    if last_token is not None and last_token.end != len(stripped_text):
+        message = "a single spec was requested, but parsed more than one:"
+        message += f"\n{text}"
+        if last_token is not None:
+            underline = f"\n{' ' * last_token.end}{'^' * (len(text) - last_token.end)}"
+            message += color.colorize(f"@*r{{{underline}}}")
+        raise ValueError(message)
+
+    return result
+
+
+class SpecSyntaxError(Exception):
+    """Base class for Spec syntax errors"""
+
+
+class SpecTokenizationError(SpecSyntaxError):
+    """Syntax error in a spec string"""
+
+    def __init__(self, matches, text):
+        message = "unexpected tokens in the spec string\n"
+        message += f"{text}"
+
+        underline = "\n"
+        for match in matches:
+            if match.lastgroup == str(ErrorTokenType.UNEXPECTED):
+                underline += f"{'^' * (match.end() - match.start())}"
+                continue
+            underline += f"{' ' * (match.end() - match.start())}"
+
+        message += color.colorize(f"@*r{{{underline}}}")
+        super().__init__(message)
+
+
+class SpecParsingError(SpecSyntaxError):
+    """Error when parsing tokens"""
+
+    def __init__(self, message, token, text):
+        message += f"\n{text}"
+        underline = f"\n{' '*token.start}{'^'*(token.end - token.start)}"
+        message += color.colorize(f"@*r{{{underline}}}")
+        super().__init__(message)
--- a/lib/spack/spack/schema/init.py
+++ b/lib/spack/spack/schema/init.py
@@ -8,14 +8,14 @@
 import llnl.util.lang
 import llnl.util.tty

-import spack.spec
-

 # jsonschema is imported lazily as it is heavy to import
 # and increases the start-up time
 def _make_validator():
    import jsonschema

+    import spack.parser
+
    def _validate_spec(validator, is_spec, instance, schema):
        """Check if the attributes on instance are valid specs."""
        import jsonschema
@@ -25,11 +25,9 @@ def _validate_spec(validator, is_spec, instance, schema):

        for spec_str in instance:
            try:
-                spack.spec.parse(spec_str)
-            except spack.spec.SpecParseError as e:
-                yield jsonschema.ValidationError(
-                    '"{0}" is an invalid spec [{1}]'.format(spec_str, str(e))
-                )
+                spack.parser.parse(spec_str)
+            except spack.parser.SpecSyntaxError as e:
+                yield jsonschema.ValidationError(str(e))

    def _deprecated_properties(validator, deprecated, instance, schema):
        if not (validator.is_type(instance, "object") or validator.is_type(instance, "array")):
--- a/lib/spack/spack/spec.py
+++ b/lib/spack/spack/spec.py
@@ -47,37 +47,6 @@

 6. The architecture to build with.  This is needed on machines where
   cross-compilation is required
-
-Here is the EBNF grammar for a spec::
-
-  spec-list    = { spec [ dep-list ] }
-  dep_list     = { ^ spec }
-  spec         = id [ options ]
-  options      = { @version-list | ++variant | +variant |
-                   --variant | -variant | ~~variant | ~variant |
-                   variant=value | variant==value | %compiler |
-                   arch=architecture | [ flag ]==value | [ flag ]=value}
-  flag         = { cflags | cxxflags | fcflags | fflags | cppflags |
-                   ldflags | ldlibs }
-  variant      = id
-  architecture = id
-  compiler     = id [ version-list ]
-  version-list = version [ { , version } ]
-  version      = id | id: | :id | id:id
-  id           = [A-Za-z0-9_][A-Za-z0-9_.-]*
-
-Identifiers using the <name>=<value> command, such as architectures and
-compiler flags, require a space before the name.
-
-There is one context-sensitive part: ids in versions may contain '.', while
-other ids may not.
-
-There is one ambiguity: since '-' is allowed in an id, you need to put
-whitespace space before -variant for it to be tokenized properly.  You can
-either use whitespace, or you can just use ~variant since it means the same
-thing.  Spack uses ~variant in directory names and in the canonical form of
-specs to avoid ambiguity.  Both are provided because ~ can cause shell
-expansion when it is the first character in an id typed on the command line.
 """
 import collections
 import collections.abc
@@ -101,7 +70,6 @@
 import spack.dependency as dp
 import spack.error
 import spack.hash_types as ht
-import spack.parse
 import spack.paths
 import spack.platforms
 import spack.provider_index
@@ -125,8 +93,6 @@
 __all__ = [
    "CompilerSpec",
    "Spec",
-    "SpecParser",
-    "parse",
    "SpecParseError",
    "ArchitecturePropagationError",
    "DuplicateDependencyError",
@@ -584,9 +550,9 @@ def __init__(self, *args):
            # If there is one argument, it's either another CompilerSpec
            # to copy or a string to parse
            if isinstance(arg, str):
-                c = SpecParser().parse_compiler(arg)
-                self.name = c.name
-                self.versions = c.versions
+                spec = spack.parser.parse_one_or_raise(f"%{arg}")
+                self.name = spec.compiler.name
+                self.versions = spec.compiler.versions

            elif isinstance(arg, CompilerSpec):
                self.name = arg.name
@@ -602,7 +568,8 @@ def __init__(self, *args):
            name, version = args
            self.name = name
            self.versions = vn.VersionList()
-            self.versions.add(vn.ver(version))
+            versions = vn.ver(version)
+            self.versions.add(versions)

        else:
            raise TypeError("__init__ takes 1 or 2 arguments. (%d given)" % nargs)
@@ -1285,6 +1252,7 @@ def __init__(
        self.external_path = external_path
        self.external_module = external_module
        """
+        import spack.parser

        # Copy if spec_like is a Spec.
        if isinstance(spec_like, Spec):
@@ -1335,11 +1303,7 @@ def __init__(
        self._build_spec = None

        if isinstance(spec_like, str):
-            spec_list = SpecParser(self).parse(spec_like)
-            if len(spec_list) > 1:
-                raise ValueError("More than one spec in string: " + spec_like)
-            if len(spec_list) < 1:
-                raise ValueError("String contains no specs: " + spec_like)
+            spack.parser.parse_one_or_raise(spec_like, self)

        elif spec_like is not None:
            raise TypeError("Can't make spec out of %s" % type(spec_like))
@@ -4974,421 +4938,6 @@ def __missing__(self, key):
 spec_id_re = r"\w[\w.-]*"


-class SpecLexer(spack.parse.Lexer):
-
-    """Parses tokens that make up spack specs."""
-
-    def __init__(self):
-        # Spec strings require posix-style paths on Windows
-        # because the result is later passed to shlex
-        filename_reg = (
-            r"[/\w.-]*/[/\w/-]+\.(yaml|json)[^\b]*"
-            if not is_windows
-            else r"([A-Za-z]:)*?[/\w.-]*/[/\w/-]+\.(yaml|json)[^\b]*"
-        )
-        super(SpecLexer, self).__init__(
-            [
-                (
-                    r"\@([\w.\-]*\s*)*(\s*\=\s*\w[\w.\-]*)?",
-                    lambda scanner, val: self.token(VER, val),
-                ),
-                (r"\:", lambda scanner, val: self.token(COLON, val)),
-                (r"\,", lambda scanner, val: self.token(COMMA, val)),
-                (r"\^", lambda scanner, val: self.token(DEP, val)),
-                (r"\+\+", lambda scanner, val: self.token(D_ON, val)),
-                (r"\+", lambda scanner, val: self.token(ON, val)),
-                (r"\-\-", lambda scanner, val: self.token(D_OFF, val)),
-                (r"\-", lambda scanner, val: self.token(OFF, val)),
-                (r"\~\~", lambda scanner, val: self.token(D_OFF, val)),
-                (r"\~", lambda scanner, val: self.token(OFF, val)),
-                (r"\%", lambda scanner, val: self.token(PCT, val)),
-                (r"\=\=", lambda scanner, val: self.token(D_EQ, val)),
-                (r"\=", lambda scanner, val: self.token(EQ, val)),
-                # Filenames match before identifiers, so no initial filename
-                # component is parsed as a spec (e.g., in subdir/spec.yaml/json)
-                (filename_reg, lambda scanner, v: self.token(FILE, v)),
-                # Hash match after filename. No valid filename can be a hash
-                # (files end w/.yaml), but a hash can match a filename prefix.
-                (r"/", lambda scanner, val: self.token(HASH, val)),
-                # Identifiers match after filenames and hashes.
-                (spec_id_re, lambda scanner, val: self.token(ID, val)),
-                (r"\s+", lambda scanner, val: None),
-            ],
-            [D_EQ, EQ],
-            [
-                (r"[\S].*", lambda scanner, val: self.token(VAL, val)),
-                (r"\s+", lambda scanner, val: None),
-            ],
-            [VAL],
-        )
-
-
-# Lexer is always the same for every parser.
-_lexer = SpecLexer()
-
-
-class SpecParser(spack.parse.Parser):
-    """Parses specs."""
-
-    __slots__ = "previous", "_initial"
-
-    def __init__(self, initial_spec=None):
-        """Construct a new SpecParser.
-
-        Args:
-            initial_spec (Spec, optional): provide a Spec that we'll parse
-                directly into. This is used to avoid construction of a
-                superfluous Spec object in the Spec constructor.
-        """
-        super(SpecParser, self).__init__(_lexer)
-        self.previous = None
-        self._initial = initial_spec
-
-    def do_parse(self):
-        specs = []
-
-        try:
-            while self.next:
-                # Try a file first, but if it doesn't succeed, keep parsing
-                # as from_file may backtrack and try an id.
-                if self.accept(FILE):
-                    spec = self.spec_from_file()
-                    if spec:
-                        specs.append(spec)
-                        continue
-
-                if self.accept(ID):
-                    self.previous = self.token
-                    if self.accept(EQ) or self.accept(D_EQ):
-                        # We're parsing an anonymous spec beginning with a
-                        # key-value pair.
-                        if not specs:
-                            self.push_tokens([self.previous, self.token])
-                            self.previous = None
-                            specs.append(self.spec(None))
-                        else:
-                            if specs[-1].concrete:
-                                # Trying to add k-v pair to spec from hash
-                                raise RedundantSpecError(specs[-1], "key-value pair")
-                            # We should never end up here.
-                            # This requires starting a new spec with ID, EQ
-                            # After another spec that is not concrete
-                            # If the previous spec is not concrete, this is
-                            # handled in the spec parsing loop
-                            # If it is concrete, see the if statement above
-                            # If there is no previous spec, we don't land in
-                            # this else case.
-                            self.unexpected_token()
-                    else:
-                        # We're parsing a new spec by name
-                        self.previous = None
-                        specs.append(self.spec(self.token.value))
-                elif self.accept(HASH):
-                    # We're finding a spec by hash
-                    specs.append(self.spec_by_hash())
-
-                elif self.accept(DEP):
-                    if not specs:
-                        # We're parsing an anonymous spec beginning with a
-                        # dependency. Push the token to recover after creating
-                        # anonymous spec
-                        self.push_tokens([self.token])
-                        specs.append(self.spec(None))
-                    else:
-                        dep = None
-                        if self.accept(FILE):
-                            # this may return None, in which case we backtrack
-                            dep = self.spec_from_file()
-
-                        if not dep and self.accept(HASH):
-                            # We're finding a dependency by hash for an
-                            # anonymous spec
-                            dep = self.spec_by_hash()
-                            dep = dep.copy(deps=("link", "run"))
-
-                        if not dep:
-                            # We're adding a dependency to the last spec
-                            if self.accept(ID):
-                                self.previous = self.token
-                                if self.accept(EQ):
-                                    # This is an anonymous dep with a key=value
-                                    # push tokens to be parsed as part of the
-                                    # dep spec
-                                    self.push_tokens([self.previous, self.token])
-                                    dep_name = None
-                                else:
-                                    # named dep (standard)
-                                    dep_name = self.token.value
-                                self.previous = None
-                            else:
-                                # anonymous dep
-                                dep_name = None
-                            dep = self.spec(dep_name)
-
-                        # Raise an error if the previous spec is already
-                        # concrete (assigned by hash)
-                        if specs[-1].concrete:
-                            raise RedundantSpecError(specs[-1], "dependency")
-                        # command line deps get empty deptypes now.
-                        # Real deptypes are assigned later per packages.
-                        specs[-1]._add_dependency(dep, ())
-
-                else:
-                    # If the next token can be part of a valid anonymous spec,
-                    # create the anonymous spec
-                    if self.next.type in (VER, ON, D_ON, OFF, D_OFF, PCT):
-                        # Raise an error if the previous spec is already
-                        # concrete (assigned by hash)
-                        if specs and specs[-1]._hash:
-                            raise RedundantSpecError(specs[-1], "compiler, version, " "or variant")
-                        specs.append(self.spec(None))
-                    else:
-                        self.unexpected_token()
-
-        except spack.parse.ParseError as e:
-            raise SpecParseError(e) from e
-
-        # Generate lookups for git-commit-based versions
-        for spec in specs:
-            # Cannot do lookups for versions in anonymous specs
-            # Only allow Version objects to use git for now
-            # Note: VersionRange(x, x) is currently concrete, hence isinstance(...).
-            if spec.name and spec.versions.concrete and isinstance(spec.version, vn.GitVersion):
-                spec.version.generate_git_lookup(spec.fullname)
-
-        return specs
-
-    def spec_from_file(self):
-        """Read a spec from a filename parsed on the input stream.
-
-        There is some care taken here to ensure that filenames are a last
-        resort, and that any valid package name is parsed as a name
-        before we consider it as a file. Specs are used in lots of places;
-        we don't want the parser touching the filesystem unnecessarily.
-
-        The parse logic is as follows:
-
-        1. We require that filenames end in .yaml, which means that no valid
-           filename can be interpreted as a hash (hashes can't have '.')
-
-        2. We avoid treating paths like /path/to/spec.json as hashes, or paths
-           like subdir/spec.json as ids by lexing filenames before hashes.
-
-        3. For spec names that match file and id regexes, like 'builtin.yaml',
-           we backtrack from spec_from_file() and treat them as spec names.
-
-        """
-        path = self.token.value
-
-        # Special case where someone omits a space after a filename. Consider:
-        #
-        #     libdwarf^/some/path/to/libelf.yamllibdwarf ^../../libelf.yaml
-        #
-        # The error is clearly an omitted space. To handle this, the FILE
-        # regex admits text *beyond* .yaml, and we raise a nice error for
-        # file names that don't end in .yaml.
-        if not (path.endswith(".yaml") or path.endswith(".json")):
-            raise SpecFilenameError("Spec filename must end in .yaml or .json: '{0}'".format(path))
-
-        if not os.path.exists(path):
-            raise NoSuchSpecFileError("No such spec file: '{0}'".format(path))
-
-        with open(path) as f:
-            if path.endswith(".json"):
-                return Spec.from_json(f)
-            return Spec.from_yaml(f)
-
-    def parse_compiler(self, text):
-        self.setup(text)
-        return self.compiler()
-
-    def spec_by_hash(self):
-        # TODO: Remove parser dependency on active environment and database.
-        import spack.environment
-
-        self.expect(ID)
-        dag_hash = self.token.value
-        matches = []
-        if spack.environment.active_environment():
-            matches = spack.environment.active_environment().get_by_hash(dag_hash)
-        if not matches:
-            matches = spack.store.db.get_by_hash(dag_hash)
-        if not matches:
-            raise NoSuchHashError(dag_hash)
-
-        if len(matches) != 1:
-            raise AmbiguousHashError(
-                "Multiple packages specify hash beginning '%s'." % dag_hash, *matches
-            )
-
-        return matches[0]
-
-    def spec(self, name):
-        """Parse a spec out of the input. If a spec is supplied, initialize
-        and return it instead of creating a new one."""
-        spec_namespace = None
-        spec_name = None
-        if name:
-            spec_namespace, dot, spec_name = name.rpartition(".")
-            if not spec_namespace:
-                spec_namespace = None
-            self.check_identifier(spec_name)
-
-        if self._initial is None:
-            spec = Spec()
-        else:
-            # this is used by Spec.__init__
-            spec = self._initial
-            self._initial = None
-
-        spec.namespace = spec_namespace
-        spec.name = spec_name
-
-        while self.next:
-            if self.accept(VER):
-                vlist = self.version_list()
-                spec._add_versions(vlist)
-
-            elif self.accept(D_ON):
-                name = self.variant()
-                spec.variants[name] = vt.BoolValuedVariant(name, True, propagate=True)
-
-            elif self.accept(ON):
-                name = self.variant()
-                spec.variants[name] = vt.BoolValuedVariant(name, True, propagate=False)
-
-            elif self.accept(D_OFF):
-                name = self.variant()
-                spec.variants[name] = vt.BoolValuedVariant(name, False, propagate=True)
-
-            elif self.accept(OFF):
-                name = self.variant()
-                spec.variants[name] = vt.BoolValuedVariant(name, False, propagate=False)
-
-            elif self.accept(PCT):
-                spec._set_compiler(self.compiler())
-
-            elif self.accept(ID):
-                self.previous = self.token
-                if self.accept(D_EQ):
-                    # We're adding a key-value pair to the spec
-                    self.expect(VAL)
-                    spec._add_flag(self.previous.value, self.token.value, propagate=True)
-                    self.previous = None
-                elif self.accept(EQ):
-                    # We're adding a key-value pair to the spec
-                    self.expect(VAL)
-                    spec._add_flag(self.previous.value, self.token.value, propagate=False)
-                    self.previous = None
-                else:
-                    # We've found the start of a new spec. Go back to do_parse
-                    # and read this token again.
-                    self.push_tokens([self.token])
-                    self.previous = None
-                    break
-
-            elif self.accept(HASH):
-                # Get spec by hash and confirm it matches any constraints we
-                # already read in
-                hash_spec = self.spec_by_hash()
-                if hash_spec.satisfies(spec):
-                    spec._dup(hash_spec)
-                    break
-                else:
-                    raise InvalidHashError(spec, hash_spec.dag_hash())
-
-            else:
-                break
-
-        return spec
-
-    def variant(self, name=None):
-        if name:
-            return name
-        else:
-            self.expect(ID)
-            self.check_identifier()
-            return self.token.value
-
-    def version(self):
-
-        start = None
-        end = None
-
-        def str_translate(value):
-            # return None for empty strings since we can end up with `'@'.strip('@')`
-            if not (value and value.strip()):
-                return None
-            else:
-                return value
-
-        if self.token.type is COMMA:
-            # need to increment commas, could be ID or COLON
-            self.accept(ID)
-
-        if self.token.type in (VER, ID):
-            version_spec = self.token.value.lstrip("@")
-            start = str_translate(version_spec)
-
-        if self.accept(COLON):
-            if self.accept(ID):
-                if self.next and self.next.type is EQ:
-                    # This is a start: range followed by a key=value pair
-                    self.push_tokens([self.token])
-                else:
-                    end = self.token.value
-        elif start:
-            # No colon, but there was a version
-            return vn.Version(start)
-        else:
-            # No colon and no id: invalid version
-            self.next_token_error("Invalid version specifier")
-
-        if start:
-            start = vn.Version(start)
-        if end:
-            end = vn.Version(end)
-        return vn.VersionRange(start, end)
-
-    def version_list(self):
-        vlist = []
-        vlist.append(self.version())
-        while self.accept(COMMA):
-            vlist.append(self.version())
-        return vlist
-
-    def compiler(self):
-        self.expect(ID)
-        self.check_identifier()
-
-        compiler = CompilerSpec.__new__(CompilerSpec)
-        compiler.name = self.token.value
-        compiler.versions = vn.VersionList()
-        if self.accept(VER):
-            vlist = self.version_list()
-            compiler._add_versions(vlist)
-        else:
-            compiler.versions = vn.VersionList(":")
-        return compiler
-
-    def check_identifier(self, id=None):
-        """The only identifiers that can contain '.' are versions, but version
-        ids are context-sensitive so we have to check on a case-by-case
-        basis. Call this if we detect a version id where it shouldn't be.
-        """
-        if not id:
-            id = self.token.value
-        if "." in id:
-            self.last_token_error("{0}: Identifier cannot contain '.'".format(id))
-
-
-def parse(string):
-    """Returns a list of specs from an input string.
-    For creating one spec, see Spec() constructor.
-    """
-    return SpecParser().parse(string)
-
-
 def save_dependency_specfiles(
    root_spec_info, output_directory, dependencies=None, spec_format="json"
 ):
--- a/lib/spack/spack/test/cmd/install.py
+++ b/lib/spack/spack/test/cmd/install.py
@@ -26,6 +26,7 @@
 import spack.util.executable
 from spack.error import SpackError
 from spack.main import SpackCommand
+from spack.parser import SpecSyntaxError
 from spack.spec import CompilerSpec, Spec

 install = SpackCommand("install")
@@ -362,7 +363,7 @@ def test_install_conflicts(conflict_spec):
 )
 def test_install_invalid_spec(invalid_spec):
    # Make sure that invalid specs raise a SpackError
-    with pytest.raises(SpackError, match="Unexpected token"):
+    with pytest.raises(SpecSyntaxError, match="unexpected tokens"):
        install(invalid_spec)


--- a/lib/spack/spack/test/cmd/spec.py
+++ b/lib/spack/spack/test/cmd/spec.py
@@ -10,6 +10,7 @@

 import spack.environment as ev
 import spack.error
+import spack.parser
 import spack.spec
 import spack.store
 from spack.main import SpackCommand, SpackCommandError
@@ -181,13 +182,11 @@ def test_spec_returncode():


 def test_spec_parse_error():
-    with pytest.raises(spack.error.SpackError) as e:
+    with pytest.raises(spack.parser.SpecSyntaxError) as e:
        spec("1.15:")

    # make sure the error is formatted properly
-    error_msg = """\
-    1.15:
-        ^"""
+    error_msg = "unexpected tokens in the spec string\n1.15:\n    ^"
    assert error_msg in str(e.value)


--- a/lib/spack/spack/test/schema.py
+++ b/lib/spack/spack/test/schema.py
@@ -68,22 +68,18 @@ def test_validate_spec(validate_spec_schema):

    # Check that invalid data throws
    data["^python@3.7@"] = "baz"
-    with pytest.raises(jsonschema.ValidationError) as exc_err:
+    with pytest.raises(jsonschema.ValidationError, match="unexpected tokens"):
        v.validate(data)

-    assert "is an invalid spec" in str(exc_err.value)
-

@pytest.mark.regression("9857")
 def test_module_suffixes(module_suffixes_schema):
    v = spack.schema.Validator(module_suffixes_schema)
    data = {"tcl": {"all": {"suffixes": {"^python@2.7@": "py2.7"}}}}

-    with pytest.raises(jsonschema.ValidationError) as exc_err:
+    with pytest.raises(jsonschema.ValidationError, match="unexpected tokens"):
        v.validate(data)

-    assert "is an invalid spec" in str(exc_err.value)
-

@pytest.mark.regression("10246")
@pytest.mark.parametrize(
--- a/lib/spack/spack/test/spec_dag.py
+++ b/lib/spack/spack/test/spec_dag.py
@@ -9,6 +9,7 @@

 import spack.error
 import spack.package_base
+import spack.parser
 import spack.repo
 import spack.util.hash as hashutil
 from spack.dependency import Dependency, all_deptypes, canonical_deptype
@@ -961,7 +962,7 @@ def test_canonical_deptype(self):

    def test_invalid_literal_spec(self):
        # Can't give type 'build' to a top-level spec
-        with pytest.raises(spack.spec.SpecParseError):
+        with pytest.raises(spack.parser.SpecSyntaxError):
            Spec.from_literal({"foo:build": None})

        # Can't use more than one ':' separator
--- a/lib/spack/spack/test/spec_semantics.py
+++ b/lib/spack/spack/test/spec_semantics.py
@@ -707,13 +707,9 @@ def test_constrain_dependency_not_changed(self):
        )

    def test_exceptional_paths_for_constructor(self):
-
        with pytest.raises(TypeError):
            Spec((1, 2))

-        with pytest.raises(ValueError):
-            Spec("")
-
        with pytest.raises(ValueError):
            Spec("libelf foo")

--- a/lib/spack/spack/test/spec_syntax.py
+++ b/lib/spack/spack/test/spec_syntax.py
--- a/lib/spack/spack/version.py
+++ b/lib/spack/spack/version.py
@@ -937,7 +937,7 @@ def __init__(self, vlist=None):
        self.versions = []
        if vlist is not None:
            if isinstance(vlist, str):
-                vlist = _string_to_version(vlist)
+                vlist = from_string(vlist)
                if type(vlist) == VersionList:
                    self.versions = vlist.versions
                else:
@@ -1165,7 +1165,7 @@ def __repr__(self):
        return str(self.versions)


-def _string_to_version(string):
+def from_string(string):
    """Converts a string to a Version, VersionList, or VersionRange.
    This is private.  Client code should use ver().
    """
@@ -1191,9 +1191,9 @@ def ver(obj):
    if isinstance(obj, (list, tuple)):
        return VersionList(obj)
    elif isinstance(obj, str):
-        return _string_to_version(obj)
+        return from_string(obj)
    elif isinstance(obj, (int, float)):
-        return _string_to_version(str(obj))
+        return from_string(str(obj))
    elif type(obj) in (VersionBase, GitVersion, VersionRange, VersionList):
        return obj
    else: