LL parser

An LL parser is a table-based top-down parser for context-free grammars. It parses the input from Left to right, and constructs a Leftmost derivation of the sentence (Hence LL, compare with LR parser). The class of grammars which are parsable in this way is known as the LL grammars, and includes several of the older programming languages. This is because LL parsers are usually simple to produce by hand.

An LL parser is called an LL(k) parser if it uses k tokens of look-ahead when parsing a sentence. If such a parser exists for a certain grammar and it can parse sentences of this grammar without backtracking then it is called an LL(k) grammar. Of these grammars LL(1) grammars, although fairly restrictive, are very popular because the corresponding LL parsers only need to look at the next token to make their parsing decisions.

Architecture of an LL(1) parser

A table-based top-down parser can be schematically presented as in Figure 1.


         +---+---+---+---+---+---+
  Input: | ( | 1 | + | 1 | ) | $ |
         +---+---+---+---+---+---+
                   ^
                   |
  Stack:           |
              +----------+
  +---+       |          |
  | + |<------|  Parser  |----->  Output
  +---+       |          |
  | S |       +----------+
  +---+            ^                  
  | ) |            |             
  +---+            |                   
  | $ |       +----------+
  +---+       | Parsing  | 
              |  table   |
              +----------+

 Figure 1. Architecture of a table-based top-down parser

The parser has an input buffer, a stack on which it keeps symbols from the grammar, a parsing table which tells it what to do given the symbols on top of its stack and its input tape. To explain its workings we will use the following small grammar:

(1) S -> F

(2) S -> ( S + F )

(3) F -> 1

When the parser starts it always starts on its stack with

[ S, $ ]

where $ is a special symbol to indicate the bottom of the stack and the end of the input stream, and S is the start symbol of the grammar. The parser will attempt to rewrite the contents of this stack to what it sees on the input stream. However, it only keeps on the stack what still needs to be rewritten. For example, let's assume that the input is "( 1 + 1 )". When the parser reads the first "(" it knows that it has to rewrite S to "( S + F )" and writes the number of this rule to the output. The stack then becomes:

[ (, S, +, F, ), $ ]

In the next step it removes the '(' from its input stream and from its stack:

[ S, +, F, ), $ ]

Now the parses sees an '1' on its input stream so it knows that it has to apply rule (1) and then rule (3) from the grammar. This results in the following stacks:

[ F, +, F, ), $ ]
[ 1, +, F, ), $ ]

In the next two steps the parser reads the '1' and '+' from the input stream and also removes them from the stack, resulting in:

[ F, ), $ ]

In the next three steps the 'F' will be replaced on the stack with '1', and then the '1' and ')' will be removed from the stack and the input stream. So the parser endes with both '$' on its stack and on its input steam. As can be seen from the example the parser performs three types of steps depending on whether the top of the stack is a nonterminal, a terminal or the special symbol $:

If the top is a nonterminal then it looks up in the parsing table on the basis of this nonterminal and the symbol on the input stream which rule of the grammar it should use to replace it with on the stack. The number of the rule is written to the output stream. If the parsing table indicates that there is no such rule then it reports an error and stops.
If the top is a terminal then it compares it to the symbol on the input stream and if they are equal they are both removed. If they are not equal the the parser reports an error and stops.
If the top is $ and on the input stream there is also a $ then the parser reports that it has succesfully parse the input, otherwise it reports an error. In both cases the parser will stop.

These steps are repeated until the parser stops, and then it will have either completely parsed the input and written a leftmost derivation to the output stream or it will have reported an error.

The parsing table for the grammar of the example looks as follows:

	(	)	1	+
S	S -> ( S + F )	-	S -> F	-
F	-	-	F -> 1	-

Constructing an LL(1) parsing table

... yet to be written' ...

First set of a rule
- What is it?
- How can we compute it?
Follow set of a nonterminal
- what is it?
- How can we compute it?
How can we use the First and Follow sets to construct the parsing table