md
Parsing and Evaluating Mathematical Expressions
June 5, 2013
-- Part 1 – Parsing a Simplified Grammar->

Introduction

Modern general purpose operating systems include an application which imitates a hand held calculator and which can always be invoked by the user when there is a need to calculate a value. However, in some circumstances this is not sufficient when the expression to be evaluated is complicated and entry is error prone. Furthermore, a programmer may need to include an expression evaluator in a program that allows users to specify a mathematical expression. The Delphi code presented here parses and evaluates such expressions given as strings such as ‘3+4*(27-9/2)’ and ‘e^sin(3*pi/4)’and returns their numerical value (93 and 2.02811498164747 respectively).

Translating Text

Breaking up text into elements or tokens and distinguishing numbers (such as 27 and 2), identifiers (such as pi and sin) and operators (such as + and *) is called lexical analysis. In the present context, this is a simple task because the type of token can be inferred from its first or, occasionally, first two characters.

The more difficult problem is the syntactical analysis or parsing of the lexical elements. In this step, numbers, identifiers and operators are arranged according to the rules defining mathematical operations so as to be easily evaluated.

The last step in translating a string into a numerical value is, of course, to do the actual calculations.

While the process can be logically divided into three steps, in practice it need not be. Evaluation of an expression can be done all at once. Here, lexical analysis will be performed separately from the rest. Parsing and evaluating can easily be combined and that is the first approach that will be considered in the following papers. When calculations must be done repeatedly as when drawing the curve defined by a function, say x*sin(x), it is best to perform the lexical and syntactic analysis of the expression only once and evaluate many times as the value of x is modified.

There is more than one way to implement this two step strategy. The avenue examined here consists in constructing a tree representing all the elements of the formula (numeric constants, calls to predefined functions or constants such as sin and pi, etc.) and the operations that are to be performed in their correct order. One of the advantages of this approach is that it makes it relatively easy for the programmer to add more internal functions, constants and variables. It is also relatively easy to include user defined functions, that is external functions that are defined by a user of a program at run time.

The material will be presented in many parts:

  1. Part 1 - Parsing a Simplified Grammar
    In the first paper, a simple grammar that contain numbers, parenthesis, and the four arithmetic operators + − * and / with the usual precedence rules is presented. This grammar contains only two type of lexical elements: numbers and operators. A simple tokenizer capable of recognizing these types of elements will be constructed. Then a one pass parser and evaluator will be added.
  2. Part 2 - Adding Comment and Exponentiation
    The grammar is expanded to include exponentiation with the infix operator ^ or ** and C-style end of line comments that start with //. This will require a small change to the tokenizer and of couse changes to the parser. The expanded grammar will be defined with a more rigorous notation.
  3. Part 3 - Adding Constants and Functions
    Built in constants, pi = 3,141... and e = 2,718..., and functions such as sin(), abs() are added. The mechanism chosen is straightforward and programmers can easily add more.
  4. Part 4 - National Language Support
    National language support is added to the parser. The result is the parser used in the application Caleçon. This article can be skipped, it’s content is not directly concerned with the subject of parsing mathematical expressions.
  5. Part 5 - Constructing a Parse Tree
    A new parser is introduced. It constructs a parse tree in memory which is a representation of syntactic elements as nodes. The tree can easily and rapidly be evaluated. To make this useful, programmer defined variables are added. There is no change to the grammar and the same tokenizer is used. And it turns out that it is simple to add user defined constants and functions.
  6. Part 6 - Adding Boolean Operations
    Adding the if() function, Boolean comparisons (=, >, >= and so on) and Boolean operations (or, and, xor). Boolean comparisons are a new syntactical element that needs to be added to the grammar. The boolean operations require minimal changes to previously defined syntactic elements of the gramme. The tokenizer must also be changed.
  7. Part 7 - Optimizing by Constant Folding
    This paper discusses two extensions. The simpler is cloning a parse tree, which is a “deep” copy of the original tree so that it and it’s clone do not share any nodes. More complex, is the optimization of a parse tree so that an expression such as sin(3*x*pi/2) will be simplified to replace the multiplication by 3 and division by 2 with the multiplication by 1.5.

Source Code and License

All the source code for the various versions of the parsers and for the accompanying demonstration programs developed by the author is released under the terms of BSD style license:

Copyright (c) 2013, 2014, Michel Deslierres

All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

For programmers that might want to use the code, this license

... is effectively a statement that you can do anything with the program or its source, but you do not have any warranty and none of the authors has any liability (basically, you cannot sue anybody). This new BSD license is intended to encourage product commercialization. Any BSD code can be sold or included in proprietary products without any restrictions on the availability of your code or your future behavior.

Do not confuse the new BSD license with “public domain”. While an item in the public domain is also free for all to use, it has no owner.

Bruce Montague

Accordingly, use this code if you find it useful, just don’t pass it off as your work. Furthermore, suggestions or constructive critiques are most welcome. All the source code for the various versions of the parsers is available free of charge. But according to the license you could sell it to someone. Why someone would pay for something that is available for free is hard to understand and it is rather unethical that someone would want to profit in such a way.

Bibliography

Montague, Bruce Why you should use a BSD style license for your Open Source Project http://www.freebsd.org/doc/en/articles/bsdl-gpl/article.html (consulted 2013/02/19)

-- Part 1 – Parsing a Simplified Grammar->