Although the error value from combinators currently does not have any
information, it can have an information because it is a char value. It
is better to use no-information-type explicitly to make it clear that
it does not have any information. So I added none_t in toml::detai and
use it in combinators and parsers as an error value from combinators.
Generate error message in `parse_something()`, not in `lex_something`.
Since the error message generated by `lex_something` is too difficult to
read for humans, I've disabled the error message generation for the sake
of efficiency (it takes time to generate error message that will never
be read). I think now the error message generation itself safely can be
removed from combinators. At this stage, `lex_something` does not need
to return `result<T, E>` because all the error type would be discarded.
Now it is turned out that returing `optional<T>` from lex_* is enough.
Maybe later I would change the return type itself, but currently I
changed the error type from std::string to char because implementing
optional takes time and effort. It makes the parsing process a bit
faster.
so far, the error value of the lexer is just ignored because they are
not readable (results from all the nested combinator are concatenated,
so they are too redundant). those ones are replaced by a simple literal.
to avoid unncessary error message generation, check the first some
characters before parsing it. It makes parsing process faster and
is also helpful to generate more accurate error messages.
all the parsers generate error messages and error message generation is
not a lightweight task. It concatenates a lot of strings, it formats
many values, etc. To avoid useless error-message generation, first check
which prefix is used and then parse special integers. Additionally, by
checking that, the quality of the error message can be improved (later).
`location::line_num()` function used to be implemented by using
`std::count`, so each time the parser encounters a type mismatch,
`std::count` was called with almost whole file. It decelerates the
parsing process too much, so I decided to add `line_number_` member
variable to `location` and add `advance/retrace/reset` to `location`
in order to modify the position that is pointed.
in 2.0.x and 2.1.0, README says "it shows warning" for invalid unicode
codepoints. So far, this library just show an error message in stderr
for this case. It is not good to change the behavior fatal in the next
minor release, 2.1.1, that includes patches and improved error msgs.
I will make it throw syntax_error after 2.2.0 for invalid unicode
codepoints. For now, I will keep it to be "warning".
Error messages related to dotted keys looks weird. like:
1 | a.b.c = 42
| ~~ in this table
The underlined token is not a table. This should be like the following.
1 | a.b.c = 42
| ~~~ in this table
To implement this, the region information is needed when the keys are
read. This commit add this functionality, though currently the region
information is not used yet.
like, with the following (invalid) toml file
> a.b = "value"
> a.b.c = 42
The error message becomes
> terminate called after throwing an instance of 'toml::syntax_error'
> what(): [error] toml::insert_value: target (a.b) is neither table nor
> an array of tables
> --> example.toml
> 1 | a.b = "value"
> | ~~~~~~~ actual type is string
> ...
> 2 | a.b.c = 42
> | ~~ inserting this
it is convenient to have get_region function that can access region_info
in toml::value. get_region is placed in toml::detail and made friend of
toml::value because I don't want to make toml::value::region_info public
and keep it internal use only.
before this, it recommends the range that can be represented by utf-8
but the range of valid unicode codepoint is narrower than that. for
error message, it is good to recommend valid unicode codepoint.