parsing - How to handle x*, x+, or x? regex-like operators in an LR parser? -


i have implemented recursive descent , peg-like parsers in past, things this:

path -> segment+ segment -> slash name segment -> / name -> /\w+/ slash -> / 
  • where segment+ means "match 1 or more segment"
  • and there's plain old regular expression matching 1 or more word characters \w+

how typically accomplish same sort of thing lr grammars/parsers? of examples of lr parsers have seen basic, such parsing 1 + 2 * 3, or (())(), patterns simple , don't seem involve "one or more" functionality (or 0 or more *, or optional ?). how do in lr parser generally?

or lr parsing require lexing phase first (i.e. lr parser requires terminal , nonterminal "tokens"). hoping there way lr parsing without 2 phases that. definition of lr parser talks "input characters" in books/sites i've been reading, see casually/subtly line like:

the grammar's terminal symbols multi-character symbols or 'tokens' found in input stream lexical scanner.

and it's what, did scanner come from.

you can write scannerless grammar language, in cases won't lr(1), because 1 token of lookahead isn't when token single character.

generally, lalr(1) parser generators (like bison) used in conjunction scanner generator (like flex).


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -