parsing - How to handle x*, x+, or x? regex-like operators in an LR parser? -
i have implemented recursive descent , peg-like parsers in past, things this:
path -> segment+ segment -> slash name segment -> / name -> /\w+/ slash -> /
- where
segment+
means "match 1 or moresegment
" - and there's plain old regular expression matching 1 or more word characters
\w+
how typically accomplish same sort of thing lr grammars/parsers? of examples of lr parsers have seen basic, such parsing 1 + 2 * 3
, or (())()
, patterns simple , don't seem involve "one or more" functionality (or 0 or more *
, or optional ?
). how do in lr parser generally?
or lr parsing require lexing phase first (i.e. lr parser requires terminal , nonterminal "tokens"). hoping there way lr parsing without 2 phases that. definition of lr parser talks "input characters" in books/sites i've been reading, see casually/subtly line like:
the grammar's terminal symbols multi-character symbols or 'tokens' found in input stream lexical scanner.
and it's what, did scanner come from.
you can write scannerless grammar language, in cases won't lr(1), because 1 token of lookahead isn't when token single character.
generally, lalr(1) parser generators (like bison) used in conjunction scanner generator (like flex).
Comments
Post a Comment