Constructing a Parse Tree

Part 5 of Parsing and Evaluating Mathematical Expressions

The parser built in the previous three parts of this series evaluates mathematical expressions in one pass which performed the three tasks of lexical analysis, syntactic analysis and calculation at one go. As simple as it is, that parser is useful as shown by the application Caleçon described at the end of the last part.

However, one of the motives for writing a parser is to draw the graph of a function defined by a user. In that context, it would be inefficient to repeat the lexical and syntactic analysis as the value of the independent variable is changed. Accordingly, a two pass parser will now be constructed. On the first pass, lexical and syntactic analysis are performed together to construct a representation of the mathematical expression called a parse tree. The second pass consists in evaluating the parse tree. This calculation can be performed as often as necessary without repeating the first pass. The distinction between the two parsers is similar to the difference between an interpreted language (the original Basic for example) and a compiled language (C or Pascal for example).

Before examining the nature of a parse tree and how to build it, a simpler problem will be addressed: the addition of built in variables.

Internal Variables

A function graphing program needs to calculate values of user defined functions of many types: y as a function of x, x as a function of y, x and y as functions of a parameter t and so on. The point being that it should be easy to add and remove variables before parsing an expression. Adding an internal variable such as x is actually rather simple because it can be handled in the list of internal functions in much the same way the constants pi and e were included. Variables are a new type of target defined as follows.

TVariable = class(TTarget) private FValue: pfloat; public constructor create(value: pfloat); function Calc: float; override; end;

When created, the address of the variable is provided and stored in FValue which is a pointer to a number. The class’s Calc method merely returns the value pointed to by FValue.

function TVariable.Calc: float; begin result := FValue^; end;

The following shows how to create a variable.

var x: float; procedure AddX; begin InternalTargets.Add('x', TVariable.Create(@x)); end;

The actual variable is declared to be of type float (which is defined in ParserTypes.pas). It can be given any name but x seems the most obvious choice as that is the name given to it when it is registered in the list on internal targets. From then on, it is possible to parse any expression that contains the identifier ‘x’ and to set the variable x to any value desired before evaluating the expression. Any number of variables can be added in such a fashion.

Recall that the function list also has a method to remove any element it contains. Thus when changing the type of function to be plotted from an explicit function y = f(x) to a parametric function where y = h(t), x = g(t) it would be wise to remove x as a variable and then add t.

InternalTargets.Delete('x'); InternalTargets.Add('t', TVariable.Create(@t));

Presumably t is another float declared elsewhere but the variable x could be reused

InternalTargets.Add('t', TVariable.Create(@x));

but that would probably be confusing to have t in an expression refer to a variable x in the program which uses the parser.

Parse Tree

A parse tree is a binary representation of a mathematical expression. It is made up of terminal (leaf) nodes that represent numbers and branch nodes that represent operations. Branch nodes will have one or more child nodes which can be branch or terminal nodes. In the first part of this series, such a tree was constructed to help describe what is involved in parsing an expression. Let’s look at two simpler expressions.

Consider first 5 + 4 * 3. It’s value is 17 because the multiplication has precedence over the addition. Hence the expression is evaluated as shown in figure 1: first 4 and 3 are multiplied and then that product is added to 5.


Figure 1: Parsing 5+4*3		Figure 2: Parsing (5+4)*3

Figure 2 shows the tree obtained when parsing the expression (5+4)*5 which is equal to 27. Note how parenthesis are not stored in the tree per se. The structure of the tree takes care of precedence. Both trees contain 2 operators and 3 constants which are represented with 2 branch nodes and 3 terminal nodes.

Each object in the tree is a descendant of an abstract class called TTree which contains only one (abstract) method, Eval, which returns the numerical value of the node. Thus nodes know how to calculate their own value. For example, a number node holds the number’s value. When the node’s Eval method is invoked, it simply returns that value. An add node’s Eval method merely returns the sum of its childrens’ Eval methods.

Evaluating a mathematical expression represented by a parse tree is done quite simply by calling the Eval method of the tree’s root (first node). Accordingly, when the root node’s Eval method is called in the tree illustrated in figure 2, the branch node ‘*’ multiplies the values returned by its children node. The left child is a ‘+’ node that adds the values of its children so it returns 9. It’s that value which is multiplied by 3 by the root node.

All that remains to be solved is how to construct the parse tree. As it turns out, this is not a very difficult task. The tree is built when parsing an expression. The functions associated with syntactic elements, Exponent, Expression, Term, Call and Factor no longer return a numeric value but, instead, they return a node representing the operation to be performed or a number.

Nodes

While the conversion of the parser is relatively simple, it is a bit tedious because many classes have to be defined to represent all the different types of nodes. Nodes represent numbers, the infix binary operators, the unary operator, and calls to internal constants and functions. As mentioned above there are no nodes to represent parenthesis, the structure of the parse tree reflects their presence in the source code. Nodes are grouped in accordance with the number of child nodes they have.

Syntactic element	Node	Number of child nodes
^	TPowerNode	2
*	TMultNode	2
/	TDivNode	2
+	TAddNode	2
– (subtraction)	TSubNode	2
– (unary)	TNegNode	1
number	TNumberNode	0
call to internal function, constant or variable	TCallNode	≥ 0 (parameter count of the function)

To simplify coding, a number of abstract classes were created. The figure below shows the hierarchy of all the classes created to represent parse tree nodes.

nodes

Grey nodes are abstract, the others are all instantiated objects. Grey nodes do not implement the abstract function Eval, the others do. The class TTree is public, defined in the interface of the ParserTypes.pas. As can be seen, it is rather sparse.

TTree = class public {: The function that evaluates a tree. Can raise an exception of type EMathError (EInvalidOp, EZeroDivide, EOverflow and EUnderflow)} function Eval: float; virtual; abstract; {$IFDEF DUMP} function ToText: string; virtual; procedure ToTreeView(tv: TTreeView); {$ENDIF} end;

The class contains two methods that are used for debugging and in the demonstration program accompanying this article. The abstract function ToText will return a string containing information about the node such as its name and information about any child node. The static method ToTreeView, takes the string returned by ToText and loads it into a TTreeView which results in a hierarchical display of the content of the parse tree.

All other nodes are hidden in the implementation section of the unit MathParser.pas which containing the parser and replaces SimpleMathParser.pas. There is only one instantiated class that is a direct descendant of TTree, called TnumberNode it has no child node and is used to store a number in a field called FValue.

All other static classes contain child nodes and hence are descendants of the abstract class TParentNode which provides an array of child nodes in the field FNodes and the number of children in the field FCount. The array is allocated on the heap and the class’s destructor takes care of releasing that memory when the node is freed. Since most nodes have exactly two children, the abstract TBinaryNode class was created as a common ancestor. It will will take care of the common task of creating a two member array of child nodes. It also implements most of the ToText method. The abstract class TUnaryNode is similar except that it is the common ancestor of nodes having only one child.

type TNodeList = array[0..maxint div sizeof(TTree) - 1] of TTree; PNodeList = ^TNodeList; TParentNode = class(TTree) private FCount: integer; FNodes: PNodeList; function Get(index: integer): TTree; procedure Put(index: integer; Node: TTree); public constructor Create(aCount: integer); destructor Destroy; override; property Nodes[index: integer]: TTree read Get write Put; end; constructor TParentNode.Create(aCount: integer); var i: integer; begin inherited create; FCount := aCount; if FCount = 0 then FNodes := nil else begin getmem(FNodes, FCount*sizeof(TTree)); {$IFDEF DEBUG} for I := 0 to aCount - 1 do FNodes^[I] := nil; {$ENDIF} end; end; constructor TUnaryNode.Create(aNode: TTree); begin inherited Create(1); Nodes[0] := aNode; end; constructor TBinaryNode.Create(aLeftNode, aRightNode: TTree); begin inherited Create(2); Nodes[0] := aLeftNode; Nodes[1] := aRightNode; end;

The implementation of the Eval method in instantiated (leaf) nodes is almost trivial as the following examples show.

function TNumberNode.Eval: float; begin result := FValue; end; function TAddNode.Eval: float; begin result := Nodes[0].eval + Nodes[1].eval end; function TDivNode.Eval: float; begin result := Nodes[0].eval / Nodes[1].eval end;

When a unary node is created, its child node must have been created and passed as a parameter. Similarly, when a binary node is created, left and right child nodes must have been created and passed as parameters to the binary node’s constructor. Its up to the parser to do this.

The decision to have a specific node type for each operator is the reason that there as so many classes that need to be created. Another approach defines only two basic node types: TUNode and TBNode. These nodes have a property indicating the operation to be performed and which may as well be simply the parsed operator itself. As a consequence evaluation is slowed down as that property must be examined to determine the function:

constructor TBNode.Create(aLeftNode, aRightNode: TTree; Op: TMdTokenType); begin inherited Create(2); Nodes[0] := aRLeftNode; Nodes[1] := aRightNode; FOperator := op; end; function TBNode.Eval; override; begin case(Operator) of ttAdd: result := Nodes[0].Eval + Nodes[1].Eval; ttMult: result := Nodes[0].Eval * Nodes[1].Eval; ttSub: result := Nodes[0].Eval - Nodes[1].Eval; ttDiv: result := Nodes[0].Eval / Nodes[1].Eval; end; end;

This is a classic trade-off. Less work coding the parser implies a bit more work coding the Eval method and a slightly slower execution as Eval is, in effect, asked to do a bit of parsing.

Producing the Parse Tree

It is quite easy to adapt the parser developed previously so that it produces a binary representation of the formula which can be evaluated in a second step. The tokenizer does not need to be changed. The parser keeps its structure but the functions Expression, Exponent, Term and Factor no longer return a numeric value. Instead they return a node which is added to the parse tree.

It is now possible to see how the parser constructs the tree representation of a formula. The functions Expression, Exponent, Term and Factor all return a TTree instead of a calculated value. And that, basically, is all that needs to be changed in the parser. Here is the code of a simplified version of Expression.

function Expression: TTree; begin result := Term; repeat if Tokenizer.TokenType = ttPlus then begin Tokenizer.NextTrueToken; result := TAddNode.Create(result, Term); end else if Tokenizer.TokenType = ttMinus then begin Tokenizer.NextTrueToken; result := TSubNode.Create(result, Term); end else exit; until false; end;

To be even more clear about how little has changed, lets look at the two versions of the function. The lines starting with i are from SimpleMathParser.pas, below, starting with c are from the function in MathParser.pas. The differences are shown in bold. As can be seen most everything is the same.

i function Expression: float; c function Expression: TTree; result := Term; repeat if Tokenizer.TokenType = ttPlus then begin Tokenizer.NextTrueToken; i result := result + Term; c result := TAddNode.Create(result, Term); end else if Tokenizer.TokenType = ttMinus then begin Tokenizer.NextTrueToken; i result := result - Term; c result := TSubNode.Create(result, Term); end else exit; until false; end;

The function Term is modified similarly, if a ‘*’ operator is found then a TMultNode is created: result := TMultNode.Create(result, Factor) and if a ‘/’ operator is found then a TDivNode is created: result := TDivNode.Create(result, Factor). The changes to Exponent are just as minimal. The function Factor is also changed in the same fashion. The function Call invoked in Factor when an identifier is found must also return a TTree; in fact it will return a TCallNode, discussed in detail in the following section.

The parse tree is constructed and returned by a function called Parse and later the tree can be evaluated by calling its Eval method.

function Parse(const source: string): TTree; begin tokenizer.Source := source; tokenizer.NextTrueToken; // prime the pump result := Expression; if (Tokenizer.TokenType <> ttEOS) then begin result.Free; RaiseParseException(peExpectedEOS); end; end;

Typical use of this function would be to calculate many times the value of a function as in the following example.

uses ParserTypes, MathParser, Targets; const step = 0.05; var x: float; i: integer; tree: TTree; begin InternalTargets.Add('x', TVariable.Create(@x)); tree := Parse('0.01*x^2'); x := -10; writlen('x', ^I, '0.01*x^2'); while x <= 10 do begin writeln(x, ^I, tree.eval); x := x + step; end; end.

Representing an Internal Function

The most complicated node is the TCallNode that is added to a parse tree when a call to any type of target defined in Targets.pas is found in an expression. TCallNode is a descendant of TParentNode because it may have child nodes. Of course it may not have any because constants such as pi will be included in a parse tree as TCallNodes. Here is the definition of the class.

Type TCallNode = class(TParentNode) private FTarget: TTarget; FArgs: pFloatArray; public constructor Create(Target: TTarget; Nodes: PNodeList); destructor Destroy; override; function Eval: float; override; {$IFDEF DUMP} function ToText: string; override; {$ENDIF} end;

The two private fields FArgs and FTarget are required by the node’s Eval method. FTarget is provided by the parser when it creates the TCallNode. At the same time, a pointer to an array of nodes must also be given to the constructor by the parser. The target’s Count property holds the number of nodes in the array.

constructor TCallNode.Create(Target: TTarget; Nodes: PNodeList); begin FTarget := Target; FNodes := Nodes; FCount := Target.Count; if FCount > 0 then begin getmem(FArgs, FCount*sizeof(float)); end; end;

Just as for binary nodes, the parser must supply the child nodes. However, since their number is variable, they are passed as a pointer to an array allocated on the heap by the parser not by the class. This is why TCallNode’s constructor does not call the inherited constructor, TParentNode.constructor, as it would create another array of child nodes. The constructor merely sets the internal field FNode to the passed pointer Nodes and updates the field FCount specifying the number of children. Finally, the constructor allocates the array of numbers that will be passed to the target when evaluating it. TCallNode’s destructor must dispose of the array allocated to FArgs but it’s parent class will dispose of the array of nodes.

destructor TCallNode.Destroy; begin if FCount > 0 then begin freemem(FArgs); end; inherited; end

When it is time to calculate the value returned by an internal function with a fixed positive number of parameters, TCallNode’s Eval function will evaluate each child node (tree) and put its value in the array of arguments pointed to by FArgs. Then the target’s ArgArray is set to point to that array of values and the target’s Calc method is invoked and its result is returned.

The same approach is used for an internal function with a variable number of parameters. The difference is in the treatment of the number of variables. In the case of a fixed number of arguments, the target’s Count property is read only, while in the case of a variable number of arguments the property is set. Of course if the node represents a constant or a variable, FCount is equal to 0 and the first part of the calculations is skipped, FCalc is called with FArgs = nil.

As argued in Part 3 – Adding Constants and Functions the array of real values holding the called function’s arguments must be allocated by the call node if the function is to be recursive. By the same token, setting the target’s ArgArray to the allocated array of real values must be done each time the function is evaluated and not when the call is parsed. In the case of targets with variable numbers of arguments, the target’s Count property must also be set each time the function is evaluated. Note that these steps must be done after the arguments are evaluated, again to take care of recursion.

As often the case, the actual routine is simpler than the explanation.

function TCallNode.Eval: float; var I: Integer; begin if FCount > 0 then begin for I := 0 to FCount - 1 do FArgs^[I] := Nodes[I].Eval; TFunction(FTarget).ArgArray := FArgs; if FTarget is TVariableFunction then TVariableFunction(FTarget).Count := FCount; end; result := FTarget.Calc; end;

As before, the call node is created by the parser’s Call function. In turn, it calls on Params to parse the parameters in the case of a function. The latter is examined first. In the one pass parser, it’s function was to return an array of real numbers that were passed on to the target function. In the case of the two pass parser, it returns the number of arguments found and an array of TTree’s or nodes, one for each argument. Here is a simplified version.

function Params(var FoundNodes: pNodeList): integer; var list: TList; i: integer; aNode: TTree; begin if Tokenizer.NextTrueToken <> ttLeftParenthesis then RaiseParseException(peExpectedLeftPar); Tokenizer.NextTrueToken; FoundNodes := nil; result := 0; list := TList.Create; repeat aNode := Expression; list.Add(aNode); if Tokenizer.TokenType = ttRightParenthesis then begin result := list.Count; if result > 0 then begin getmem(FoundNodes, result*sizeof(TTree)); for i := 0 to result - 1 do FoundNodes^[i] := TTree(List[i]); end; list.free; exit; end; if Tokenizer.TokenType <> ttSeparator then RaiseParseException(peExpectedSeparator); Tokenizer.NextTrueToken; // skip ',' until false; end;

Since the number of parameters found is not known, a TList is used to accumulate the nodes until a ‘)’ is found. Once the end of the list of arguments is found, the array of nodes is allocated on the heap and the list is freed, but not it’s objects since these are now in the allocated array. If using only recent versions of Delphi, FoundNodes could be a dynamic array, and SetLength would be used to increase it’s size as needed.

The actual function is complicated with a try–except block to recover resources should an error occur when parsing the parameters.

Even if the parsing of arguments is done by Params, the Call function is still rather long. It’s first step is to find a target with the name found by the parser’s Factor function. If no target with the specified name is found, an exception is raised. Otherwise, a check of the number of expected arguments is made. If the target’s Count property is equal to 0, the target is a constant or a variable and there is no parameter list that follows. Otherwise, Params is invoked to parse the list of arguments. There follows some error checking. First if the target is a TVariableFunction a check is made that the minimum required number of parameters has been found and if it has, the target’s Count is set to the number of found variables.

function Call: TTree; var Nodes: pNodeList; Target: TTarget; FoundCount: integer; n: integer; begin if not InternalTargets.Find(Tokenizer.Token, Target) then RaiseParseException(peUnknownIdentifier); if Target.Count = 0 then begin Nodes := nil; Tokenizer.NextTrueToken; end else begin FoundCount := Params(Nodes); if Target is TVariableFunction then begin if FoundCount < TVariableFunction(Target).MinCount then begin FreeNodes; RaiseParseException(peTooFewParams); end; Target.Count := FoundCount // to set TCallNode's FCount end else begin if FoundCount <> Target.Count then begin FreeNodes; if FoundCount < Target.Count then RaiseParseException(peTooFewParams) else {if FoundCount > count then} RaiseParseException(peTooManyParams) end; end; Tokenizer.NextTrueToken; // skip ')' last so above errors better located end; result := TCallNode.Create(Target, Nodes); end;

For the remaining types of targets, the method check that the number of found parameters is equal to the expected number and raises an exception if not. Finally a TCallNode is created and returned by the function.

User defined functions and constants

One of the happy consequences of building a parse tree to represent an expression is that it is relatively easy to add user defined functions to the parser. Such functions will be defined by the user at run time with an assignment statement such as f(x,y) := 2*y+x. This statement will be parsed to extract a target’s name from the left hand side and a parse tree representing the expression on the right hand side. That tree will be evaluated when the named target is invoked in a subsequent expression. A new type of target is needed in which the tree is saved along with the number of parameters expected.

TUserFunction = class(TFunction) private FTree: TTree; public constructor create(aTree: TTree; ParamCount: integer); destructor Destroy; override; function Calc: float; override; property Tree: TTree read FTree write FTree; end;

Note that TUserFunction inherits from TFunction because it has a fixed number of arguments. Strictly speaking that is not necessary and TUserFunction could descend from TVariableFunction, but it is hard to see how the user would define such a function without some sort of a for or while loop which is not defined in the simple grammar defined here.

User defined constants as in k := 9.2 are very straightforward. Parsing the right hand side, Parse('9.2') will yield a TNumberNode with the value 9.2. The target’s Calc will call that node’s Eval method that will return 9.2 which Calc will then return. But parsing the right hand side of a user function such as Parse('2*x+x') will fail miserably because the identifiers x and y mean nothing to the parser, or if they do, it is because variables of that name have been added to the list of targets and they are not what is actually referred to.

The formal parameters of the function being defined must be recognized. This is done by first adding a new type of node:

TParameterNode = class(TTree) private FUserFunction: TUserFunction; FIndex: integer; public constructor Create(aUserFunction: TUserFunction; Index: integer); function Eval: float; override; end;

The y when parsing 2*y+x for the function f(x,y)represents the second parameter (index = 1) while the x represents the first parameter (index 0). This is what is stored in the field FIndex. But to recover the value of the parameter, the node must know where to look, where the array of values is on the heap. This why the user function is also stored in the node.

function TParameterNode.Eval: float; begin result := FUserFunction.Args[FIndex] end;

How does the parser know the index of x and y? A list of parameter names, in the order in which they appear in the definition of the function is created before parsing the right hand side. The function Call now checks this list when an identifier is found and if so creates and returns TParameterNode. Notice that ParamList has a field named UserFunction which holds the pointer to the current user function being parsed. Hence Call now begins with

begin n := ParamsList.IndexOf(Tokenizer.Token); if n >= 0 then begin result := TParameterNode.Create(ParamsList.UserFunction, n); Tokenizer.NextTrueToken; exit; end; if not ExternalTargets.Find(Tokenizer.Token, Target) then if not InternalTargets.Find(Tokenizer.Token, Target) then RaiseParseException(peUnknownIdentifier);

As can be seen, the search for an identifier is the following order :
  1^st in the list of formal parameters of a function (ParamsList),
  2^nd in the list of user defined functions and constants (ExternalTargets),
  3^rd in the list of built-in functions and constants (InternalTargets).

Hence a formal parameter x when a function f(x, ...) is parsed will be found before any variable x that may have been added to ExternalTargets or InternalTargets. This order also means that users could redefine built-in functions. All that would happen is that the user defined function ‘hides’ the internal target’s name. (Actually this behaviour is controlled by the compiler directive CAN_REDEFINE_BUILTIN_FUNCTION found in Parsers.inc.)

Here is the formal definition of an assignment statement:

ParamList = "(", Identifier { ListSym, Identifier } ")". Assignment = Identifier [ParamList] ":=" Expression.

The correct way of handling assignments would be to expand Parse but this is not the approach used here. Because in the applications in which the parser will be used, user defined functions are handled separately, assignments will be parsed in another procedure: ParseAssign. It breaks up the work in two parts as hinted above. It handles parsing the left hand side of the assignment which consists of getting the name of the target and the list of formal parameters if they are present. Then it passes of parsing the right hand side to Parse and completes the procedure by creating a user function which it adds to the list of external targets. Here is the first part of ParseAssign. It’s straightforward, the code is mostly checking for possible errors.

procedure ParseAssign(const source: string); var Name: string; uf: TUserFunction; begin tokenizer.Source := source; // get the name if tokenizer.NextTrueToken <> ttIdentifier then RaiseParseException(peExpectedIdentifier); Name := tokenizer.Token; if InternalTargets.HasTarget(Name) then RaiseParseException(peIdentIsBuiltInName); if ExternalTargets.HasTarget(Name) then RaiseParseException(peDuplicateIdentifier); // get param list if there's one if tokenizer.NextTrueToken = ttLeftParenthesis then begin ParseParamsList; if tokenizer.TokenType <> ttRightParenthesis then RaiseParseException(peExpectedRightPar); tokenizer.NextTrueToken; end; ...

Of course there remains the question of building the list of formal parameters. The relevant procedure merely accumulates the names of the formal parameters in an unsorted string list named ParamsList. A formal parameter must be an identifier, which cannot be the name of the function itself (held in the variable Name) and which cannot be repeated.

procedure ParseParamsList; begin while (Tokenizer.TokenType <> ttRightParenthesis) do begin if Tokenizer.NextTrueToken <> ttIdentifier then // cannot have f(2) := 4*x RaiseParseException(peExpectedFormalParameter) else if Tokenizer.Token = Name then // cannot have f(x, f) := 2 RaiseParseException(peInvalidFormalParameterName) else if ParamsList.IndexOf(Tokenizer.Token) >= 0 then // cannot have f(x, x) := 2*x RaiseParseException(peDuplicateFormalParameterName); ParamsList.Add(Tokenizer.Token); if not (Tokenizer.NextTrueToken in [ttSeparator, ttRightParenthesis]) then RaiseParseException(peExpectedSeparatorOrRightPar); end; end;

Once that is done here is the rest of the routine or at least a simplified version without resource protection if an error occurs when parsing the right hand side expression.

procedure ParseAssign(const source: string); begin ... // find := if tokenizer.TokenType <> ttAssign then RaiseParseException(peExpectedAssign); tokenizer.NextTrueToken; // get expression uf := TUserFunction.Create(nil, ParamsList.Count); ParamsList.UserFunction := uf; uf.Tree := Expression; if Tokenizer.TokenType <> ttEOS then RaiseParseException(peExpectedEOS); ExternalTargets.Add(Name, uf); end;

Note that user function is created with an empty tree before the right hand side is parsed. This is necessary because Parse needs to create parameter nodes that know the address of the user function, as stored in ParamsList.

Some Coding Details

Unfortunately, the parser shown above will often leak memory when parsing a badly formed formula or when the programmer forgets to free a parse tree when it is no longer needed. There is no automatic safeguard against the latter type of mistake in Object Pascal but some others leaks can be avoided. Because exceptions are used to signal parsing errors, it is not very difficult to prevent memory leaks. Here is the final version of Expression.

function Expression: TTree; begin result := Term; try repeat if Tokenizer.TokenType = ttPlus then begin Tokenizer.NextTrueToken; result := TAddNode.Create(result, Term); end else if Tokenizer.TokenType = ttMinus then begin Tokenizer.NextTrueToken; result := TSubNode.Create(result, Term); end else exit; until false; except { End up here if an error occurred parsing a second or later term. Need to free the tree already allocated to result.} result.Free; raise; end; end;

The first line of the function, result := Term may never be completed if an exception occurs when parsing that first term. In that case a node is not created and assigned to the variable result and execution of the rest of the code is aborted. If the first line is executed without error, then a node is created by Term and assigned to result. If the next token is a ‘+’ or a ‘-’, Term will be called again. Should an exception occur when parsing this new term, then the function will be exited. The tree already allocated to result would then be lost. To avoid that, the repeat loop is encased in a try–except statement. If an exception occurs when parsing a second or later term, the already allocated tree is freed in the except block. The exception is then raised again to continue unwinding the parse tree.

The functions Exponent, Term, Factor and Params are also modified in the same fashion to prevent memory leaks.

The demonstration program has a new form, Table.pas. It shows how the parser can be used to construct a table of values for a function defined by a user. A method,

{display the numbers x and fx in the string grid's row number Row} procedure TTableForm.AddToTable(Row: integer; x, fx: extended);

takes care of writing x and fx, the value of the function at x, in the specified row of a string grid. To calculate the values to be displayed in the table, the following is done before hand. A first value of x, the last value of x and the amount by which x will be incremented from the first value to the last value are all specified. In all cases, these parameters are can all defined by a mathematical expression that is parsed and evaluated with the function EvaluateExpression defined in TreeMathParser. No variables are included in the list of internal functions at that point. The value of the variable x is incremented from the first to the last value in a loop and at each iteration the tree containing the representation of the mathematical expression is evaluated with the new value of x.

Then ‘x’ is added as a variable in the list of functions and Parse is called with the user defined function of x. The returned tree is saved in the variable tree of type TTree. The value of the variable x is set to the firstValue. Filing the table consists in performing a loop that calls the tree’s Eval function to get the value of the function at x, stores the values in the table and then increments the value of the variable x.

MAXLOOP := round ( (lastvalue - firstvalue) / step ) + 3; InternalTargets.Add('x', TVariable.Create(@x)); try tree := Parse(Edit4.Text); try sg.Cells[0,0] := 'x'; sg.Cells[1,0] := Edit4.Text; x := firstValue; i := 2; while (i < MAXLOOP) do begin AddToTable(i, x, Tree.eval); inc(i); x := x + step; end; finally tree.Free; end; finally InternalTargets.Delete('x'); end;

When all is completed, x is removed from the list of internal functions.

Homework

1. The parse tree presented here is resolutely object oriented. Parse tree existed long before objects became the fashion. Try doing this with pointers and records.

2. Modify Call and Params so that C-types can enter a constant as a function. In other words let pi() and e() be alternate ways of invoking pi and e.

3. Instead of a separate ParseAssign method, modify the Parse function so that it handles assignments just like any other expression. It would probably be nice to inform the user that the constant or function has been added to the ExternalTargets list.

Source code

The source code for the tokenizer, parser and translation units is available along with a simple demonstration programme and test code. It is in the archive ParsingMathExpr-5.zip.

The demonstration program compiles with Delphi version 2 and up. It should also compile with Lazarus/FreePascal with minimal changes. As before, unit tests require DUnit.pas which works with newer versions of Delphi only. Unit testing was done with Delphi 2010.