roadmap.html

<!Doctype html>
<html lang="en">
    <head>
        <title>Roadmap</title>
        <meta charset="UTF-8">
        <!--<link rel="stylesheet" href="css/bootstrap.min.css">-->
        <link rel="stylesheet" href="css/style_new.css">
        <script src="js/jquery-1.12.1.min.js" charset="utf-8"></script>
        <link rel="stylesheet" href="js/embed-2cd369fa1c0830bd3aa06c21d4f14a13e060d2d31bbaae740f4af4.css"><div id="gist28627206" class="gist"></div>
        <link rel="stylesheet" href="js/embed-cbe5b40fa72b0964f90d4919c2da8f8f94d7c9f6c2aa49c07f6fa3.css"><div id="gist28627206" class="gist"></div>
    </head>

    <div class="container">
        <header id="navtop">
            <a href="index.html" class="logo fleft"><img src="img/logo.png" alt=""></a>
            <nav class="fright">
                <ul>
                    <li><a href="index.html">Home</a></li>
                    <li><a href="about.html">About</a></li>

                        <li><a href="roadmap.html" class="navactive">Roadmap</a></li>
                        <li><a href="documentation.html">Documentation</a></li>
                </ul>
            </nav>
        </header>
        <div class="Services-page main grid-wrap">
            <header class="grid col-full">
                <hr/>
                <p class="fleft">ROADMAP</p>
                <div style="background-color: #dff0d8;padding:4px 8px;border-radius: 4px;" class="grid col-full">
                    <h4 style="font-size: 16px">
                      mkDocs version of the roadmap is available at <a href="https://silcnitc.github.io/expl-docs/roadmap" target="_blank">silcnitc.github.io/expl-docs/roadmap</a>
                    </h4>
                </div>
                <!--<a class="button" href="">Download as PDF</a>-->
            </header>
            <aside class="grid col-one-quarter mq2-col-full">
                <menu>
                    <ul>
                        <li class="sec"><a href="#nav-stage0">0. Installation</a> </li>
                        <li class="sec"><a href="#nav-stage1">1. CodeGeneration for Arithmetic Expressions</a></li>
                        <li class="sec"><a href="#nav-stage2">2. Introduction to static storage allocation</a></li>
                        <li class="sec"><a href="#nav-stage3">3. Adding Flow Control Statements</a></li>
                        <li class="sec"><a href="#nav-stage4">4. User Defined Variables and arrays</a></li>
                        <li class="sec"><a href="#nav-stage5">5. Adding Functions</a></li>
                        <li class="sec"><a href="#nav-stage6">6. User defined types and Dynamic Memory Allocation</a></li>
                    <!--    <li class="sec"><a href="#nav-stage7">7. Register Allocation</a></li> -->
                        <li class="sec"><a href="#nav-stage7">7. Adding Objects – Data encapsulation</a></li>
                        <li class="sec"><a href="#nav-stage8">8. Inheritance and Sub-type Polymorphism</a></li>


                    </ul>
                </menu>
            </aside>
            <body>
            <section class="grid col-three-quarters mq2-col-full">
                <div class="grid-wrap">
                    <article class="grid col-full">
                        <h2>Using the Roadmap</h2>
                        <p>
                            This roadmap is divided into several stages, to be done in sequential order. Incrementally you will build a compiler for the ExpL language according to its specification. Links are provided for background reading material wherever appropriate. It will be assumed that you have background in C programming, Data Structures and Principles of Computer Organization.
                        </p>
                    </article>
                    <article class="grid col-full" id="nav-stage0">
                        <h2>Stage 0 : Installation and Preparation</h2>
                         <b>Time Estimate :</b> 2 weeks, 5-10 hours/week
                        <p>
                            <b>Pre-requisites</b>: NIL<br>
                            In this stage, you will download and familiarize yourself with the simulation package and learn the compiler design software tools LEX and YACC.  Follow the instructions below.
                        </p>
                        <p>
                            <ul>
                                <li id="otis">1.  Install the LEX, YACC and the XSM simulator package.  Follow the instructions <a href="install.html" target="_blank">here</a></li>.
                            </ul>
                        </p>
                        <p>
                            You need to learn two software tools - YACC and LEX which you will use in the project. These tools are somewhat
                            sophisticated. Fortunately, understanding what is enough for the purpose of our compiler project is not very difficult. The following tutorials will help you through this process.
                        </p>
                        <p>
                            If you are not already familiar with the tools LEX and YACC do the following:<br>
                            <ul>
                                <li id="otis">2.  Complete the <a href="lex.html" target="_blank">LEX tutorial</a>.</li>
                                <li id="otis">3.  Complete the <a href="yacc.html" target="_blank">YACC tutorial</a>.</li>
                                <li id="otis">4.  Complete the <a href="ywl.html" target="_blank">Using YACC with LEX tutorial</a>.</li>
                            </ul>
                        </p>
                        <p>
                            If you are not already familiar with using the GNU debugger, do the following:
                            <ul>
                                <li id="otis">5.  Complete the <a href="gdb.html" target="_blank">GDB tutorial</a>.</li>
                            </ul>
                        </p>

                        <p>
                            The next step is to understand the target machine envionment. You must carefully go through the following tutorial
                            before proceeding to the next stage of this roadmap.
                        </p>
                        <p>
                            <ul>
                                <li id="otis">6.  Complete the <a href="xsm-environment-tut.html">XSM execution environment tutorial</a>.</li>
                            </ul>
                        </p>
                        <p>
                            With this, you are ready with all the required pre-requisites to proceed further in this roadmap.
                        </p>
                            <div class="up column3 mright"> <a href="#navtop" class="ir">Go up</a> </div>
                    </article>
                    <article class="grid col-full" id="nav-stage1">
                        <h2>Stage 1 : Code generation for Arithmetic Expressions</h2>
                           <b>Time Estimate :</b> 0.5 week, 5-10 hours/week
                        <p>
                            <b>Prerequisites</b>:<br/>
                            <ul>
                                <li id="otis">1.  You must be comfortable with LEX and YACC.  ( If you are not, you must first do <a href="lex.html" target="_blank">LEX</a> tutorial, <a href="yacc.html" target="_blank">YACC</a> Tutorial and <a href="ywl.html" target="_blank">Using Lex with Yacc</a> tutorials.)</li>
                                <li id="otis">2.  You must have completed the <a href="xsm-environment-tut.html" target="_blank">XSM environment tutorial</a> <b>including all the exercises</b> before staring this stage.</li>
                            </ul>
                        </p>
                        <p>
                            <b>Learning Objectives</b>:<br/>
                            <p>In this stage, you will:</p>
                            <ul>
                                <li id="otis">1.  Parse an input  arithmetic expression and create an expression tree using YACC and LEX.</li>
                                <li id="otis">2.  Recursively traverse the tree and generate assembly language code.  The allocation of registers for storing results of intermediate computations will be handled enroute.</li>
                            </ul>
                            <hr>
                        </p>
                        <p>
                            A compiler is a software that takes as input a high level program and produces a machine
                            recognizible target program as output.  The high level program typically allows variables,
                            expressions, conditionals, iterative constructs, user defined types, functions etc.
                            The low level target program on the other hand will be a sequence of assembly level
                            instructions that can be run on a target machine (which is the XSM simulator in this
                            project).
                          </p>
                        <p>
                            The strategy of the roadmap is to help you build the compiler in stages.
                            We start here by building a very simple compiler whose input (high level) program
                            contains only simple arithmetic expressions.
                            In subsequent stages, we will add more and more constructs to the
                            input language one by one, learning the relevant theoretical concepts along the way.
                        </p>

                      <!--  <p>
                            In this stage, you will implement a very simple compiler that can take an arithmetic expression as input (from some input file)  and generate a target executable file containing XSM instructions to evaluate the expression and output the result.
                        </p> -->
                        <p>
                            We assume that you have implemented the library routine for handling console output,
                            which you were asked to do in the <a href="xsm-environment-tut.html" target="_blank">XSM execution environment tutorial</a>.
                        </p>

                        <p>
                            Consider arithmetic expressions with the following syntax.
                            <div class="syntax">
                                E :  E + E | (E) | NUM
                            </div>
                            Where the <a href="lex.html#navyytext" target="_blank">lexeme</a> <b>NUM</b> correspond to integers.  Assume left <a href="yacc.html#associativity" target="_blank">associativity</a>  for '+'. Thus, the <a href="lex.html#token" target="_blank">tokens</a> relevant are NUM and +. The attribute value associated with a number is the number read.  Assume that the input file is passed as argument to the main() function in YACC.
                        </p>
                        <p>
                            The lexer must pack the attribute into a tree node of the following structure:
                           <div class="syntax">
                            typedef struct tnode{<br>
                            &emsp;int val;
                            &emsp;char *op;           //indicates the name of the operator for a non leaf node<br>
                            &emsp;struct tnode *left,*right;      //left and right branches<br>
                            }tnode;<br>
                            <br>
                            #define YYSTYPE tnode*
                            </div>
                            Since the semantics actions in the parser must build the tree, the following function must be written:
                            <div class="syntax">
                            /*Make a leaf tnode and set the value of val field*/<br/>
                            struct tnode* makeLeafNode(int n);<br/>
                            <br/>
                            /*Make a tnode with operator, left and right branches set*/<br/>
                            struct tnode* makeOperatorNode(char op,struct tnode *l,struct tnode *r);
                            </div>
                        </p>
                        <p>
                            <b>Task 1</b>:  Build the expression tree for the given input.<br>
                            <b>Exercise 1</b>:  Output the prefix and postfix forms of the expression from the tree.<br>
                        </p>

                        <p>
                            (Note:  You would have already completed this task if you have done the <a href="ywl.html" target="_blank"> Using Yacc With Lex tutorial </a>).
                        </p>
                        <p>
                            Now, comes the next task - to generate assembly language program equivalent for the expression and write it out into an executable file in the XEXE format.  Once this is done, you can use the simulator to load the XEXE file into the memory of the XSM machine and execute it as outlined in the <a href="xsm-environment-tut.html" target="_blank"> XSM run time environment tutorial</a>.
                        </p>

                        <p>
                            To do this, one needs to know the following:<br>
                            <ul>
                                <li id="otis">1. The <a href="abi.html#nav-XSM-instruction-set" target="_blank">machine model and the instruction set</a> of the target machine.</li>

                                <li id="otis">2. Format of the <a href="abi.html#nav-XEXE-executable-file-format" target="_blank"> executable file</a>.</li>

                                <li id="otis">3. You need to know the address in the memory (in the target machine) where each instruction you generate will be loaded (by the OS loader). This is because program control instructions like JMP, CALL etc.,  requires specification of the jump address.
                            </p>
                            <p>
                                As already outlined in the <a href="xsm-environment-tut.html" target="_blank"> XSM run time environment tutorial</a>, the header will be loaded into addresses 2048-2055. The first instruction generated by you will be loaded to the address 2056. Each XSM instruction occupies 2 memory words. Hence, the next instruction will be loaded at address 2058 and so on. The entry point field of the header must contain the address of the first instruction to be fetched and executed.</li>

                                <li id="otis">4. You need to fix the memory addresses where variables and other data is stored. For example, for each variable in the program, the compiler will have to allocate storage space in memory. The ABI stipulates that the region for this is the <a href="abi.html#nav-virtual-address-space-model" target="_blank"> stack region</a>. Thus each variable must be stored in some address between 4096 and 5119.</li>

                                <li id="otis">5. Since XSM machine stipulates that arithmetic and logic instructions can be done only when operands are loaded into machine registers, we need to load the contents of variables/constants in the program into the machine registers before processing them. This brings in the problem of register allocation. The XSM machine makes available 20 registers (R0-R19) for the compiler.</li>
                            </ul>
                        </p>
                        <p>
                            Of the above, the XSM execution environment tutorial has already explained (1) and (2). Evaluation of expressions do not involve either storage allocation or program control transfer (JMP). Hence, we will not take up (3) and (4) at this stage. However, we need to solve (5) now.
                        </p>
                        <p><h3>What must be the evaluation strategy?</h3></p>
                        <p>
                            Let us take an example:<br>
                            If you are given a two node expression tree as shown below corresponding to the expression (3+2):<br>
                        </p>
                        <p>
                            <img src="img/tree1.png">
                        </p>
                        <p>
                            The evaluation strategy will be:
                            <ul>
                                <li id="otis">1.  Store 3 in a register – say R0.</li>
                                <li id="otis">2.  Store 2 in a register – say R1.</li>
                                <li id="otis">3.  ADD R0, R1.</li>
                            </ul>
                        </p>
                        <p>
                            The result will be stored in R0 and is sufficient for us. To generate code for the above tasks and write it into a target_file, you must write code as:

                        <div class="syntax">
                            fprintf(target_file, "MOV R0, 3");<br>
                            fprintf(target_file, "MOV R1, 2");<br>
                            fprintf(target_file, "ADD R0, R1");<br>
                        </div>

                            However, life becomes complicated if we have an expression like (3+2)+(5+6) resulting in the following tree.
                        </p>
                        <p>
                            <img src="img/tree2.png">
                        </p>
                        <p>
                            Of course, we can “hand craft” this case also.  But the strategy will not generalize. The basic issue is that your compiler does not know the input expression before-hand.  Technically speaking, the issue is that the “expression is not available at <b>compile time</b>, but only known at <b>run time</b>”. Your code generation module must be more <i>"intelligent"</i> to handle <i>arbitrary expressions</i>.
                        </p>
                        <p>
                            The root of the problem with the above code is that R0 and R1 were picked by you and not by your compiler. Thus, we must have a <b>register assignment policy</b> (basically a function) that returns a free register whenever we require one. That is, you must design the following functions:
                        </p>
                        <div class="syntax">
                            int getReg()   //  Allocate a free register
                        </div>

                        <p>
                            That returns the register number of an unallocated register, so that your code for adding 3 and 2 would look like:
                        </p>

                        <div class="syntax">
                            int p = getReg();<br>
                            int q = getReg();<br>
                            fprintf(target_file, “MOV R%d, 3”, p);<br>
                            fprintf(target_ file, “MOV R%d, 2”, q);<br>
                            fprintf(target_file, “ADD R%d, R%d,”, p,q);<br>
                        </div>

                        <p>
                            In addition to allocating registers, you must also have mechanism to <b>release a register</b> back into the register pool. In the above example, after the ADD instruction is generated R1 can be released and send back to the register pool.
                        </p>

                        <p>
                            For this purpose, you will write a function
                        </p>

                        <div class="syntax">
                            freeReg() //  Releases a register.
                        </div>

                        <p>
                            To make the allocation  strategy simple,  we suggest that you  generate target code in such a way that  <i>the result of a CPU instruction involving two registers will be always stored in the register with lower index</i>. In the code above the result of the computation is kept in R0 and not R1 so that the register with the higher index value can be released. As a consequence, the freeReg() function does not require any arguments. Instead, freeReg() and getReg() can be designed to keep track of the highest numbered register allocated so far and hence can keep track of the correct register that must be allocated or freed.
                        </p>
                        <p>
                            The following summarizes the register allocation strategy:
                        </p>
                        <div class="syntax">
                            1. Whenever a register is needed, allocate the lowest numbered register that is free. (Thus, give R0 if possible, otherwise R1 etc.)<br>

                            2.  Whenever we free a register, always release the highest used register that was allocated previously. (Thus, if R0, R1 and R2 were allocated, freeReg() must release R2).
                        </div>

                        <p>
                            Finally, we must design a code generation module.  The strategy here is to start with an expression tree and do the following:
                        </p>

                        <div class="syntax">
                            1.  At the leaf nodes of the tree (corresponding to a NUM),  Allocate a new register and store the number &emsp;to the register.<br>
                            2.  At the intermediete nodes :<br>
                            &emsp;a.  Generate code for the left subtree (recursively). Find out the register holding the result.<br>
                            &emsp;b.  Evaluate the right subtree (recursively). Find out the register holding the result.<br>
                            &emsp;c.  ADD the contents of the two registers and store the result in the lower numbered register.<br>
                            &emsp;d.  Release the higher numbered register and return.<br>
                        </div>

                        <p>
                            In the above box, as step 2.a and 2.b requires finding the index of the register which stores the result of expression evaluation. The simplest strategy is to <b>design a codeGen() function that can take as input an expression tree and generates code for the expression, returning the index of the register storing the result</b>:
                        </p>
                        <div class="syntax">
                            #define reg_index int;<br>
                            reg_index codeGen( struct node *t) {<br>
                            &emsp;..<br>
                            &emsp;..<br>
                            &emsp;return register number storing result<br>
                            }
                        </div>
                        <p>
                            The codeGen() function takes as input a pointer to the root of an expression tree and generates code for the subtree rooted at that node. After generating code, the function must return the index of the register storing the result. See this <a href="codegen.html" target="_blank">link</a> for furthur details.
                        </p>
                        <p>
                            <b>Task 2</b>: Complete the simple compiler for expression evaluation and generate the executable file. The result of expression evaluation may be stored in the first location of the stack region – memory address 4096. This value may be printed out using the write system call. Note that the XEXE executable format must be adhered so that the XSM simulator can load and execute the file.
                        </p>
                        <p>
                            <b>Note</b>:  To run the simulator, you must prepare the library.lib together with the XEXE executable file.  Please follow instructions in the <a href="xsm-environment-tut.html" target="_blank">XSM environment tutorial</a>.
                        </p>
                        <p>
                            <b>Exercise 2</b>:  Modify the grammar to
                            <div class="syntax">E :  E + E | E*E | E-E| E/E | (E) | NUM</div>  Assume standard rules of precedence and associativity.
                        </p>
                        <p>
                            <b>Exercise 3</b>:  Redo Exercise 2 assuming that the input expression is given in prefix from.
                        </p>
                        <p>
                            <b>Note:</b> Here we assumed that machine registers never get exhausted. XSM provides 20 general purpose registers and these registers are sufficient for all practial purposes.  However, if all registers are exhausted, then space will have to be allocated in memory.  We will not address this contingency in this roadmap. If register pool is exhausted, your compiler may stop compilation and flag "Out of registers"  error.
                        </p>
                        <div class="up column3 mright">
                            <a href="#navtop">  &#x2191; </a>
                        </div>
                    </article>

                    <article class="grid col-full" id="nav-stage2">
                        <h2>Stage 2. Introduction to static storage allocation</h2>
                           <b>Time Estimate :</b> 0.5 week, 5-10 hours/week
                        <p>
                            <b>Prerequisites</b> :
                            <ul>
                                <li id="otis">1.  You must be comfortable with LEX and YACC.  (If you are not, you must first do <a href="lex.html" target="_blank"> LEX tutorial</a>, <a href="yacc.html" target="_blank">YACC Tutorial</a> and <a href="ywl.html" target="_blank"> YACC+LEX tutorial</a>.)</li>
                                <li id="otis">2.  You must have completed the <a href="xsm-environment-tut.html" target="_blank">XSM environment tutorial</a> before starting this stage.</li>
                            </ul>
                        </p>
                        <p>
                            <b>Learning Objectives</b> :
                            <ul>
                                <li id="otis">In this stage, you will extend the expression evaluator of the previous stage to support a set of pre-defined variables with Input/Output and assignment statements. You will get introduced to the notion of <b>static storage allocation</b> enroute. You will also learn to differentiate between statements and expressions and also construct an <i>abstract syntax tree representation</i> for a program.</li>
                            </ul>
                            <hr>
                         </p>
                         <p>
                            Consider a simple programming language with the following syntax:
                            <div class="syntax">
                                Program ::= BEGIN Slist END | BEGIN END<br>
                                <br>
                                Slist ::=  Slist Stmt | Stmt<br>
                                <br>
                                Stmt ::= InputStmt |  OuptputStmt | AsgStmt<br>
                                <br>
                                InputStmt ::= READ(ID);<br>
                                <br>
                                OutputStmt ::== WRITE(E);<br>
                                <br>
                                AsgStmt ::==  ID = E;<br>
                            </div>
                            Apart from the <a href="lex.html" target="_blank">literal tokens</a>, BEGIN, END, READ, and WRITE are tokens corresponding to keywords "begin", "end", "read" and "write".  ID is a token for variables.  We will permit only variables [a-z] in this stage.
                         </p>
                         <p>
                            To support variables to appear in expressions, you must add the rule <b>E := ID</b> to the expression syntax used in Stage 1. The above syntax defines a small programming language that permits just straight line programs (programs without conditionals, loops, jumps or such control transfer constructs). There are only 26 pre-defined variables that are supported – a, b,c,..,z. A typical program would look like:
                            <div class="syntax">
                                begin<br>
                                &emsp;read (a);<br>
                                &emsp;read (b);<br>
                                &emsp;d = a + 2 * b;<br>
                                &emsp;write (a+d);<br>
                                end;
                            </div>
                         </p>
                         <p>
                            We will assume that variables can store only integers. Handling variables of multiple types will be taken up in subsequent stages.
                        </p>
                        <p>
                            A conceptual point to note here is that apart from the addition of variables, the extended language now has two kinds of constructs – expressions and <b>statements</b>. While an expression evaluates to a value (in this case, we limit ourselves to integer expressions), a statement commands the execution of some action by the machine. For example, the statement <i>read(a);</i> instructs the action of reading a variable from the console into a variable a. <i>write(a+d);</i> instructs evaluation of the expression (a+d) and printing the result into the console output.
                        </p>
                        <p>
                            Another important conceptual point to note is that the introduction of variables also demand binding them to storage (memory) locations. The storage location associated with a variable must hold the value of the variable at each point of program execution. A statement (like the assignment statement or a read statement) that alters the value of a variable must result in a change the value stored in the corresponding storage location.
                        </p>
                        <p>
                            In the present case, the compiler can fix the address for each variable in memory right at the time of program compilation. Since the ABI stipulate that storage allocation must be done in the stack region, we can pre-allocate the first 26 memory locations in the stack region of <a href="abi.html#nav-virtual-address-space-model" target="_blank">memory</a> for the variables a-z. Thus, variable <i>a</i> will refer to contents of address 4096, <i>b</i> to contents of address 4097 and so on. Any time the compiler encounters the variable – say <i>a</i>, the address to be looked at is fixed – in this case 4096. Such allocation policy is called <b>static allocation</b>. In later stages you will encounter situations where it will not be possible for the compiler to fix memory address of a variable at compile time. This leads to <b>run time</b> and <b>dynamic memory allocation</b> policies. For now, we will be content with static allocation.
                        </p>
                        <p>
                            To implement the above, your compiler must:
                            <ul>
                                <li id="otis">1. Fix the storage location for each variable. As noted above, the first 26 locations of the stack region starting at address 4096 may be assigned for a to z. Note that the XSM machine can store an integer in a single memory location. Hence, for each variable we need to allocate only 1 memory word. Note that "allocation" here means that while generating code, the compiler assumes that the variable a is stored in location 4096, b in location 4097 and so forth.</li>

                                <li id="otis"><b>Note</b> : Some programming languages stipulate that variables must be initialized to zero.  In that case, the compiler must generate code to MOV 0 to each of these locations before generating code for statements in the program. Some machines provide machine instructions that support initializing memory to zero.  Certain operating systems would have initialized all memory regions (except those to which code is loaded into) to zero at load time.  We will not pursue these issues here.</li>

                                <li id="otis">2. To translate an assignment statement, the compiler must generate code to evaluate the expression and then MOV the contents of the register storing the result to the memory location allocated for the variable.</li>

                                <li id="otis">3. To translate a Read statement, the compiler must generate code to invoke the <a href="abi.html#nav-eXpOS-system-library-interface" target="_blank">library</a> function for read, passing the address of the variable as argument. Write is implemented similarly.</li>
                            </ul>
                        </p>
                        <p>
                            But before getting into code generation, we must create an <b>abstract syntax tree (AST)</b> for the program.  An abstract syntax tree is a tree representation of the program, just like an expression tree for expressions. An abstract syntax tree for the above program would look like the following:
                        </p>
                        <p>
                            <img src="img/ast_stage2.png">
                        </p>
                        <p>
                            Observe that each node now needs to store distinguishing information like:
                            <ul>
                                <li id="otis">1. Whether it corresponds to a variable, constant, operator, assignment statement, write statement or read statement. </li>
                                <li id="otis">2. In case of operators, the information on the operator must be present. In the case of constants, the value must be stored in the node. In the case of variables, the node must contain the variable name.</li>
                                <li id="otis">3. There are also connector nodes which simply join two subtrees of statements together.</li>
                            </ul>
                        </p>
                        <p>
                            This leads to the definition of the following node structure:
                            <script src="js/dd1979bba5d35250a0c9419520a6b5b8.js"></script>
                        </p>
                        <p>
                            <b>Task 1</b> : Use Yacc and Lex to generate abstract syntax tree representation of a program. A file containing the source program will be input to your program.
                        </p>
                        <p>
                            Thus, after parsing, we use the syntax directed translation scheme of YACC to construct an intermediate representation – namely, the abstract syntax tree. This phase of compilation is sometimes called the <b>front end</b> of the compiler. The next step is to recursively traverse the expression tree to generate executable code.  This is typically called the <b>back end</b>. The output of the front end is generally a <b>machine independent intermediate representation</b> like the AST. The back end of course will be dependent on the target platform.
                        </p>
                        <p>
                            <b>Task 2</b> : Modify CodeGen() function of Stage 1 to generate code for the abstract syntax tree generated as Task 1 above.
                        </p>
                        <p>
                            In the next stage, we will see how program control instructions like if-then-else can be incorporated into the language.
                        </p>
                        <p>
                            <b>Note</b> : An abstract syntax tree is an <a href="https://en.wikipedia.org/wiki/Intermediate_representation" target="_blank">intermediate representation</a> of the source program in a tree form suitable for code generation. There are several other forms of intermediate representations like the three address code form, the static single assignment form etc. This roadmap will be based on the abstract syntax tree representation.
                        </p>
                        <p>
                            In commercial strength compilers, the source is first translated to intermediate forms like the three address form which is a <b>lower level representation</b> (that is the intermediate form is closer to machine code) than the AST. Typically machine independent <a href="https://en.wikipedia.org/wiki/Optimizing_compiler" target="_blank">code optimizations</a> are performed on the intermediate code and only then the back-end code generation is run. This step is followed by another set of machine dependent code optimizations before the target file is finally generated. As these issues are beyond the scope of our project, we will not dwell into these matters further in this roadmap.
                        </p>
                        <p>
                            <b>Exercise 1</b> : Build an <b>evaluator</b> for the program. (Hint: Your front end does not change. But, instead of generating code from the AST, you can recursively "evaluate" it. For storage allocation of variables, you can simply declare an array that can store 26 integers and allocate one entry for each variable).
                        </p>
                        <p>
                            <b>Note</b> : The compiler generates target code which must be executed by the target machine. In our case, the compiler you wrote as Task 2 actually is a <b>cross compiler</b>. This means that your compiler generated target code that is not for your host system, but on some other target platform – which in our case the simulated XSM machine. The evaluator done in Exercise 1 actually does not generate "code" for any machine. Instead, it executes the program in "then and there". Such a program could be classified as an <b>interpreter</b>. (Unfortunately, the standard terminology in literature associated with the term "interpreter" seems to be contradictory to this classification).
                        </p>
                        <div class="up column3 mright">
                            <a href="#navtop">  &#x2191; </a>
                        </div>
                    </article>
                    <article class="grid col-full" id="nav-stage3">
                        <h2>Stage 3:  Adding Flow Control Statements</h2>
                           <b>Time Estimate :</b> 1 week, 5-10 hours/week
                        <p>
                            <b>Prerequisite</b>:  You must read the  <a href="label-translation.html" target="_blank">label translation tutorial</a> before proceeding with this stage.
                        </p>
                        <p>
                            <b>Learning Objectives</b>:
                            <ul>
                                 <li id="otis">In this stage, you will extend the straight-line-program compiler of Stage 2 to support control flow constructs like if-then-else, while-do, break and continue. You will encounter integer and boolean expressions and the notion of <i>type</i> enroute. You will also learn the use of <i>labels</i> for handling control flow constructs.</li>
                            </ul>
                            <hr>
                        </p>
                        <p>
                            The if-then-else and the while-do constructs can be added to the source language of Stage 2 by adding the grammar rule:
                            <div class="syntax">
                                Ifstmt ::=  IF (E) then Slist Else Slist ENDIF<br>
                                &emsp;&emsp;&emsp;&emsp;&emsp;| IF (E) then Slist ENDIF;
                                <br>
                                Whilestmt ::= WHILE (E) DO Slist ENDWHILE;<br>
                            </div>
                            To permit logical expressions, we need to add to the grammar the following productions:
                            <div class="syntax">
                                E ::= E &lt; E  |  E &gt; E  |  E &lt; =E  |  E &gt;= E  |  E != E  |  E == E;<br>
                            </div>
                            A simple program in this language to find the largest among three numbers would look like the following:
                            <div class="syntax">
                                &emsp;read(a);<br>
                                &emsp;read(b);<br>
                                &emsp;read(c);<br>
                                &emsp;if (a &lt; b) then<br>
                                &emsp;&emsp;if (b &lt; c) then Write(c);  else Write(b); endif; <br>
                                &emsp;else<br>
                                &emsp;&emsp;if (a &lt; c) then Write(c); else Write(a); endif; <br>
                                &emsp;endif; <br>
                            </div>
                            Note that we continue to assume that variables hold only integer values. The first task in translation is to complete the front end.
                        </p>
                        <p>
                            There is one important conceptual point to understand here before proceeding to the front end implementation.  With the introduction of logical expressions, there are two <b>types</b> of expressions in the language – <b>arithmetic expressions</b> and <b>logical expressions</b>. An arithmetic expression evaluates to an integer value whereas a logical expression evaluates to a <b>boolean value</b> – that is true/false.
                        </p>
                        <p>
                            The guard of an if-else statement or a while-do statement must be a boolean expression.  On the other hand, the expression on the right side of an assignment statement must be of integer type as variables are assumed to hold integer values only. In other words, the statements given below are <b>invalid</b>.
                            <div class="syntax">
                                if (a+b) then Write(c);<br>
                                &emsp;&emsp;OR<br>
                                a = b &lt; c;<br>
                            </div>
                        </p>
                        <p>
                            Your compiler must flag a "<i>type mismatch</i>" error if such constructs are encountered during the AST construction process. A program with type errors must not pass the compiler's type check scrutiny and the compiler must report error without generating code. Type analysis is a part of the responsibilities of a compiler (normally classified under <a href="https://en.wikipedia.org/wiki/Semantic_analysis_(compilers)" target="_blank">semantic analysis</a>).
                        </p>
                        <p>
                            A simple way to handle this issue is to <i>annotate each node in the AST with a <b>type attribute</b></i> that indicates what is the type of the expression (or subexpression) with this node as the root.
                        </p>
                        <p>
                            For example, consider the AST for the following erratic expression.
                            <div class="syntax">
                                d = ( a + b ) + ( c &lt; 3 )
                            </div>
                            <p>
                               <img src="img/ast3.png">
                            </p>
                        </p>
                        <p>
                            Here, the root of the AST is an assignment node which is <b>typeless</b>. (statements have no type, only expressions have a type associated with them). The left subtree of the root is a variable, and hence has type integer. The right subtree is a  <b>+</b>  node of type integer. Hence, at the root, there is no type mismatch. However, the right child of the right subtree has type boolean and does not match the operand type for the + operator. Hence the compiler must terminate compilation flagging error "type mismatch". Note that the compiler can stop processing when the first error is encountered without proceeding further with the tree construction.
                        </p>
                        <p>
                            To implement type checking, add a type field in the AST node structure.
                            <script src="js/09129e56138b0207d595eaefbf2873cd.js"></script>
                            At the leaf nodes of the tree, since you have either constants or variables, the type must be set to integer. Next, while constructing the tree for intermediate nodes, check whether the types of the children are compatible with the operator at the root. For instance, for the addition operation, the check could be as the following:
                            <div class="syntax">
                            E :== E+E {<br>
                            &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;if (($1->type != inttype) || ($2->type != inttype)) {<br>
                            &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;error("type mismatch"); <br>
                            &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;exit();<br>
                            &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;} else {<br>
                            &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;$$->type = inttype); <br>
                            &emsp;&emsp;&emsp;&emsp;}<br>
                            </div>
                        </p>
                        <p>
                            If there is no mismatch, you must annotate root node ($$->type) with the proper type (integer in the above case).
                        </p>
                        <p>
                            <li id="otis">
                                <b>Note</b>:  The above check is better done inside the TreeCreate() function so that the YACC file is not cluttered with C statements.
                            </li>
                        </p>
                        <p>
                            The essential idea is that the type of each node can by synthesized from the types of the subtrees. At any stage, the compiler may terminate flagging error if a type error is found.
                        </p>
                        <p>
                            <b>Task 1</b>:  Complete the front-end module (AST construction) for the programming language. You need to
                            <ul>
                                <li id="otis">(1) add additional lexical tokens for the new constructs</li>
                                <li id="otis">(2) make appropriate modifications in the tree node structure including provision for storing type attribute</li>
                                <li id="otis">(3) modify the TreeCreate() function to have three subtrees passed (for if-then-else) etc.</li>
                            </ul>
                        </p>
                        <p>
                            <b>Exercise 1</b>:  To test the implementation of Task 1, implement an <b>evaluator</b> for the expression tree.  Test with simple programs like those for finding the largest of 3 numbers, sum of n numbers (n read from input) etc.
                        </p>
                        <p>
                            The next task is to complete the back-end code generation phase.  For better clarity, we will split the task into two steps.
                        </p>
                        <p>
                            <ul>
                                <li id="otis"><b>Step 1</b>:  Generate code with <b>labels</b>.  At this stage labels will be placed at various control flow points of the target assembly code so that a JMP instruction will only indicate the label corresponding to the instruction to which transfer of program flow must happen.</li>
                                <li id="otis"><b>Step 2</b>:  Replace the labels with addresses.</li>
                            </ul>
                        </p>
                        <p>
                            <b>Important note</b>: You must have read the <a href="label-translation.html" target="_blank">label translation tutorial</a> before proceeding any further.
                        </p>
                        <p>
                            We will now look at Subtask 1. Consider the following statement:
                            <div class="syntax">
                                while (a < b) {<br>
                                &emsp;&emsp;a = a+1 ;<br>
                                }<br>
                            </div>
                        </p>
                        <p>
                            The expression tree for the above statement would look like:
                        </p>
                        <p>
                            <img src="img/ast31.png">
                        </p>
                        <p>
                            Suppose variable a is bound to address 4096, b to address 4097, then our plan is to generate code that would look like the following:
                            <div class="syntax">
                            L1:  <br>
                            MOV R0, [4096]&nbsp;&nbsp;// transfer a to R0<br>
                            MOV R1, [4097]&nbsp;&nbsp;//  transfer b to R1<br>
                            LT R0, R1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//  a&lt;b<br>
                            JZ R0,L2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//  if (a&lt;b) is false goto L2  <br>
                            MOV R0, [4096]&nbsp;&nbsp;//  transfer a to R0<br>
                            ADD R0, 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;//  add 1  <br>
                            MOV [4096], R0&nbsp;&nbsp;// transfer sum back to a<br>
                            JMP L1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;// goto next iteration. <br>
                            L2: <br>
                            ... Next Instruction ..  <br>
                            </div>
                        </p>
                        <p>
                            Note the use of labels L1 and L2 indicating control flow points in the above code. A while statement involves two jumps and two labels. The labels are just symbols that are placed at the start of instructions to which jump instructions must branch to. Placing labels in the code relieves us from bothering about the exact memory address to which jump must be made. Of course, this is only a temporary measure. The final target code must not contain labels.
                        </p>
                        <p>
                            Our strategy here is to first generate code with labels and then replace the labels with addresses. To implement the plan, we may name labels in the program L0, L1,.... We must design an <i>int GetLabel()</i> function that returns the index of the next unused label.  Thus the first call to <i>GetLabel()</i> returns 0, next call returns 1 and so forth.
                        </p>
                        <p>
                            The code generation strategy for the while-do statement is illustrated by the following pseudo-code.
                            <div class="syntax">
                                int label_1 = getLabel();<br>
                                int label_2 = getLabel();<br>
                                fprintf (target_file "L%d", Label_1)  //  Place the first label here.  <br>
                                Generate code for the guard expression.    <br>
                                Generate code to compare the result to zero and if so jump to label_2 // loop exit   <br>
                                Generate code for the body of the while loop.<br>
                                fprintf(target_file, "JMP L%d", label_1);   // return to the beginning of the loop.  <br>
                                fprintf(target_file, "L%d", label_2);    // Place the second label here<br>
                            </div>
                        </p>
                        <p>
                            <b>Task 2</b>:  Complete the code generation with labels for while-do, if-then and if-then-else constructs.
                        </p>
                        <p>
                            Now, we must complete Step 2 of replacing the labels with the correct addresses. This is explained in the <a href="label-translation.html" target="_blank">label translation documentation</a>.
                        </p>
                        <p>
                            <b>Task 3</b>: Read the link specified above and complete the label translation for  if-then, if-then-else and the while-do statement.
                        </p>
                        <p>
                            <b>Exercise 2</b>: Test your Task 3 code with the following programs:
                            <ul>
                                <li id="otis">(a) program to find the largest for a, b, c (values read from input)</li>
                                <li id="otis">(b) program to read numbers till 0 is input and output the sum. </li>
                            </ul>
                        </p>
                        <p>
                            <b>Task 4</b>: Add <b>break</b> and <b>continue</b> statements.  Code for these statements need be generated only if they appear inside some while loop.  Otherwise, the compiler may simply ignore these statements, generating no code.  (The primary task is to keep track of which label to jump to when one of these statements is encountered).
                        </p>
                        <p>
                            <b>Exercise</b> : Add <i>repeat-until</i> and <i>do-while</i> statements to the language with standard semantics.
                        </p>
                        <p>
                            <b>Reading Exercise</b> : Please read the <a href="gdb.html" target="_blank">GNU Debugger (GDB) tutorial</a> ,before proceeding to the next stage for learning the GDB debugger.
                        </p>
                        <p>
                            <b>Note</b>:  Often in practice, programming languages allow a program to be split into different functions, written in different source files. In such cases, each file is separately compiled and the compiler generates target code with labels without translating them into addresses.  Even variable references will be symbolic and the actual addresses may not be determined.  Such target files are called <a href="https://en.wikipedia.org/wiki/Object_file" target="_blank">object files</a>.  The compiler will include symbol table information in the object file for translation later.  A separate software called the <a href="https://en.wikipedia.org/wiki/Linker_(computing)" target="_blank">linker</a> will collect the information in all the symbol tables and combine the object files into a single executable file replacing labels and symbolic variable references with actual addresses.
                        </p>
                        <div class="up column3 mright">
                            <a href="#navtop">  &#x2191; </a>
                        </div>
                    </article>
                    <article class="grid col-full" id="nav-stage4">
                        <h2>Stage 4:  User Defined Variables and arrays</h2>
                           <b>Time Estimate :</b> 1 week, 5-10 hours/week
                           <br>
                        <p>
                            <b>Prerequisites</b>:<br/>
                            <ul>
                                <li id="otis">1.  You must have completed the <a href="gdb.html" target="_blank">GNU Debugger (GDB) tutorial</a> before starting this stage.</li>
                            </ul>
                        </p>
                        <p>
                            <b>Learning Objectives</b>:<br>
                            You will extend the language of Stage 3 to permit users to declare and use variables of <i>integer</i> and <i>string</i> types. You will learn <b>symbol table</b> management enroute.
                            <hr>
                        </p>
                        <p>
                            In this stage, we allow the program to contain variable declarations of the following syntax:
                            <div class="syntax">
                                Declarations ::=  DECL  DeclList ENDDECL | DECL ENDDECL<br>
                                <br>
                                DeclList ::=  DeclList Decl | Decl<br>
                                <br>
                                Decl  ::= Type VarList ;<br>
                                <br>
                                Type ::=  INT | STR<br>
                                <br>
                                VarList ::= Varlist , ID | ID<br>
                            </div>
                        </p>
                        <p>
                            We will assume hereafter that all variables used in a program must be <b>declared</b> in the declaration section of the program (between the <i>decl</i> and <i>enddecl</i> keywords). Since string type variables are allowed, we will allow string constants as well.  (See <a href="expl.html#nav-constant">ExpL specification</a> for details).
                        </p>
                        <p>
                            A simple program in this language to find the sum of numbers entered from the console (until a zero is entered) would look like the following:
                            <div class="syntax">
                            decl<br>
                            &emsp;&emsp;int num, sum;<br>
                            &emsp;&emsp;str mesg;<br>
                            enddecl<br><br>
                            read(num);<br>
                            sum = 0;<br>
                            while (num != 0) do<br>
                            &emsp;&emsp;sum = sum + num;<br>
                            &emsp;&emsp;read(num);<br>
                            endwhile;<br>
                            write("sum is:");<br>
                            write(sum);<br>
                            mesg = "good bye";<br>
                            write(mesg);<br>
                            </div>
                        </p>
                        <p>
                            It is the responsibility of the compiler to track for various <b>semantic errors</b> as:
                            <ul>
                                <li id="otis">1.  Flag error if any variable not declared is used.</li>
                                <li id="otis">2.  Flag error if a type mismatch involving any variable is found.</li>
                            </ul>
                        </p>
                        <p>
                            To this end, while parsing declarations, the compiler transfers the information about variables in a compile time data structure called the <b>symbol table</b>. The symbol table stores the following information about each variable:
                            <ul>
                                <li id="otis">1.  Name of the variable (known at the time of declaration).</li>
                                <li id="otis">2.  Type (For the present stage, only integer/string).</li>
                                <li id="otis">3.  Size (For the time being, we will assume that all variables have size one).</li>
                                <li id="otis">4.  The memory <b>binding</b> of each variable – that is, static memory address determined by the compiler for the variable.</li>
                            </ul>
                        </p>
                        <p>
                            The first three entries are determined by the declaration of the variable. For the fourth, a simple strategy would be to allocate the first address (4096) for the variable declared first, 4097 for the next variable and so on. Note that here too we are fixing the address of each variable at compile time (<b>static allocation</b>).
                        </p>
                        <p>
                            The following structure may be used for a symbol table entry:
                            <div class="syntax">
                                struct Gsymbol {<br>
                                &emsp;&emsp;char* name;&emsp;&emsp;// name of the variable<br>
                                &emsp;&emsp;int type;&emsp;&emsp;&emsp;// type of the variable<br>
                                &emsp;&emsp;int size;&emsp;&emsp;&emsp;// size of the type of the variable<br>
                                &emsp;&emsp;int binding;&emsp;// stores the static memory address allocated to the variable<br>
                                &emsp;&emsp;struct Gsymbol *next;<br>
                                  <!-- ...  any other field for data structure maintainance .. -->
                                }
                            </div>
                        </p>
                        <p>
                            The symbol table entries for the program above would look as below:
                            <img src="img/gsymboltable1.png">
                        </p>
                        <p>
                            To implement the symbol table, you must write two functions.  For a simple implementation, a linear linked list suffices.  In modern compilers, hash tables are maintained to make search efficient.
                        </p>
                        <p>
                            <div class="syntax">
                                struct Gsymbol *<i>Lookup</i>(char * name);  //  Returns a pointer to the symbol table entry for the variable, returns NULL otherwise.<br>
                                <br>
                                void <i>Install</i>(char *name, int type, int size);  // Creates a symbol table entry.
                            </div>
                        </p>
                        <p>
                            <b>Note</b>:  You must check before installing a variable whether the variable is already present.  If a variable is declared multiple times, the compiler must stop the compilation and flag error.
                        </p>
                        <p>
                            <b>Task 1</b>: Complete the program to parse declarations and set up the symbol table entries and print out the contents of the symbol table.
                        </p>
                        <p>
                            The next task is to make necessary modifications to the AST construction and code generation. These are straightforward. Add a an additional field to the tree node structure
                            <div class="syntax">
                                typedef struct tnode{<br>
                                &emsp;&emsp;int val;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;//value of the constant<br>
                                &emsp;&emsp;char* varname;&emsp;&emsp;&emsp;//name of the variable<br>
                                &emsp;&emsp;int type;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;//type of the variable<br>
                                &emsp;&emsp;int nodetype;&emsp;&emsp;&emsp;&emsp;//node type information<br>
                                &emsp;&emsp;struct Gsymbol *Gentry;&emsp;&emsp;//pointer to GST entry for global variables and functions<br>
                                &emsp;&emsp;struct tnode *left,*right;&emsp;&emsp;//left and right branches<br>
                                }tnode;
                            </div>
                        </p>
                        <p>
                            While constructing the tree, if a variable is encountered, keep a pointer to the corresponding symbol table entry. Set the type field as well to the correct type. The rest of the type checking steps are exactly as in the previous stage. The AST of while loop present in the above code is as follows (the relevant part of the code is shown below for easy reference):
                            <div class="syntax">
                            .<br>
                            .<br>
                            .<br>
                            while (num != 0) do<br>
                            &emsp;&emsp;sum = sum + num;<br>
                            &emsp;&emsp;read(num);<br>
                            endwhile;<br>
                            .<br>
                            .<br>
                            .<br>
                            </div>
                        </p>
                        <p>
                            <img src="img/ast_stage4.png">
                        </p>
                        <p>
                            There is no serious change to the code generation process, except that for variables, the binding address is obtained from the symbol table.<br><br>
                            <b>Important Note</b>:  The XSM architecture is unrealistic in that allows a memory word to hold a string.  Normally, in a real system, a string would require storing the characters one after another in consecutive memory locations as in an array.   You will anyway learn array allocation immediately.
                        </p>
                        <p>
                            <b>Task 2</b>:  Complete the AST construction and code generation steps.
                        </p>
                        <p>
                            <h4>Adding arrays</h4>
                            The next step is to allow declaration of arrays like:
                            <div class="syntax">
                                decl<br>
                                &emsp;---<br>
                                &emsp;int a[100];<br>
                                &emsp;str names[20];<br>
                                &emsp;---<br>
                                enddecl
                            </div>

                            The declaration syntax must permit:
                            <div class="syntax">
                            Varlist ::= Varlist , ID[NUM] | ID[NUM]
                            </div>
                            To implement this, for each variable, you must reserve as much static space as specified by the declaration and set the size field in the symbol table to indicate the number of words allocated.  The next variable must be allocated space only below this space allocated.
                        </p>
                        <p>
                            For instance, for the declaration,
                            <div class="syntax">
                                decl<br>
                                &emsp;&emsp;int a[10], b;<br>
                                enddecl
                            </div>
                            The binding field in the symbol table for the variable a  may be set to address 4096.
                            The size entry set to 10.  This means that we are allocating locations 4096-4105 for the array.
                            The next variable, b can be bound to the address 4106.
                            <img src="img/gsymboltable2.png">
                        </p>
                        <p>
                            <b>Task 2</b>:  Complete the implementation of single dimensional arrays.
                            <br><br>
                            <b>Exercise 1</b>:  Permit two dimensional arrays like:
                            <div class="syntax"> int a[10][10]; </div>
                            Test your implementation with a program for multiplying two nxn matrices.
                            <br><br>
                            <b id="stage4_ex2">Exercise 2</b>:  Permit <i>pointer type</i> variables as in the following declaration as in the C programming language.
                            <div class="syntax">
                                decl<br>
                                &emsp;&emsp;int x, *p;<br>
                                &emsp;&emsp;str p, *q;<br>
                                enddecl
                            </div>
                            If you permit assignments like p=&amp;x; and q=&amp;p; , pointer variables may also be permitted in expressions like *p=*q+1; for referring to the data pointed to, as permitted in the C programming language. Semantic rules as in the C programming language may be assumed.
                        </p>
                        <p>
                            <b>Note</b>:  Right now, you are not equipped to do dynamic memory
                             allocation for pointer variables (as done by the malloc() function of C).
                             Hence, a pointer type variable can be used as a pointer to another statically
                             declared variable of the corresponding type.  Dynamic memory allocation will be discussed in later stages.
                        </p>
                        <h4> <b>Test Programs </b></h4>
                        <p>
                          Check your implementation with the following test cases : <br>

                        <h4> 1. Bubblesort (iterative) </h4>
                        <p>
                        <p>
                          This test program reads elements into an array and sorts them using the classic bubblesort algorithm. (iterative version)
                        </p>
                        <p>
                          <b> Input </b> : 1. Number of elements to be sorted from standard input.
                                           2. Elements to be sorted
                        </p>
                        <p>
                          <b> Output </b> : A sorted array of elements.
                        </p>
                        To get the code for this test program <a href="testprograms/stage4/bubblesort.html" target="_blank">click here</a>.
                      </p>
                        </p>
                        <h4> 2. Nth Fibonacci Number(iterative) </h4>
                        <p>
                        <p>
                          This test program prints the nth fibonacci number
                        </p>
                        <p>
                          <b> Input </b> : 1.An integer n
                        </p>
                        <p>
                          <b> Output </b> : nth fibonacci number
                        </p>
                        To get the code for this test program <a href="testprograms/stage4/fibaofn.html" target="_blank">click here</a>.
                      </p>
                        </p>
                        <h4> 3. Is Prime or Not </h4>
                        <p>
                        <p>
                          This program tests if a given integer is prime or not.
                        </p>
                        <p>
                          <b> Input </b> : 1.An integer n
                        </p>
                        <p>
                          <b> Output </b> : Prime if n is prime else not a prime.
                        </p>
                        To get the code for this test program <a href="testprograms/stage4/prime.html" target="_blank">click here</a>.
                      </p>
                        </p>
                        <h4> 4. Sum of n factorials (iterative) </h4>
                        <p>
                        <p>
                          This program prints the sum to n factorial for a given n.
                        </p>
                        <p>
                          <b> Input </b> : 1.An integer n
                        </p>
                        <p>
                          <b> Output </b> : sum of factorial of all integers 1 to n.
                        </p>
                        To get the code for this test program <a href="testprograms/stage4/sum-to-n-fact.html" target="_blank">click here</a>.
                      </p>
                        </p>

                        <br>

                    </article>
                    <article class="grid col-full" id="nav-stage5">
                        <h2>Stage 5:  Adding Functions</h2>
                           <b>Time Estimate :</b> 2 weeks, 5-10 hours/week
                        <p>
                            <b>Prerequisite Reading</b> :  You must read the following documents before proceeding with this stage:<br>
                            <ul>
                                 <li id="otis">1.  The main page of the document <a href="run-data-structures.html" target="_blank">Run time allocation</a>.</li>
                                 <li id="otis">2.  <a href="run_data_structures/run-time-stack.html" target="_blank">Run time stack Allocation</a>.</li>
                            </ul>
                        </p>
                         <p>
                            <b>Learning Objectives</b>:
                            <ul>
                                 <li id="otis">You will extend the  language of Stage 4 by adding functions with support for recursion. Addition of functions to the language requires handling <b>scope</b> of variables.  Support for recursion demands <b>run-time storage allocation</b>. Only <i>integer</i> and <i>string</i> type variables will be supported.</li>
                            </ul>
                            <hr>
                        </p>
                        <p>
                            This is the first major stage in the ExpL project. A skeletal outline of the syntax rules for defining the extension of the language of Stage 4 to support subroutines is as below. You are required to fill in rules required to complete the grammar. Note that variables may be only of type integer/string.
                            <div class="syntax">
                                Program ::= GDeclBlock FdefBlock <a href="grammar-outline.html" target="_blank">MainBlock</a><br>
                                &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;| GdeclBlock MainBlock<br>
                                &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;| MainBlock<br>
                                <br>
                                GdeclBlock  ::=  DECL GdeclList ENDDECL | DECL ENDDECL<br>
                                <br>
                                GdeclList ::= GDeclList  GDecl | GDecl
                                <br>
                                GDecl ::=  Type GidList ; <br>
                                <br>
                                GidList ::=  GidList , Gid | Gid <br>
                                <br>
                                Gid ::= ID <br>
                                &emsp;&emsp;&emsp;&emsp;&emsp;| ID[NUM] <br>
                                &emsp;&emsp;&emsp;&emsp;&emsp;| ID(ParamList) <br>
                                --------------------------------------------------------------------------------------
                                <br>
                                FDefBlock ::=  FdefBlock Fdef |  Fdef <br>
                                <br>
                                Fdef ::=Type ID ( ParamList ) { LdeclBlock Body } <br>
                                <br>
                                ParamList ::= ParamList , Param | Param<br>
                                &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;| &emsp;&emsp;/*param can be empty*/<br>
                                <br>
                                Param ::= Type ID <br>
                                <br>
                                Type ::= INT | STR  <br>
                                -----------------------------------------------------------------------------------------
                                LdeclBlock ::= DECL LDecList ENDDECL | DECL ENDDECL <br>
                                <br>
                                LDecList ::= LDecList LDecl | LDecl<br>
                                <br>
                                LDecl ::=  Type IdList ;  <br>
                                <br>
                                IdList ::=  IdList, ID | ID <br>
                                <br>
                                Type ::= INT | STR <br>
                            </div>
                        </p>
                        <p>
                            Since a function call is treated as an expression (whose value is the return value of the function),  the following rules must be added:
                            <div class="syntax">
                                E ::= ID ()  |  ID(ArgList) <br>
                                <br>
                                ArgList ::=  ArgList, E  |  E <br>
                            </div>
                        </p>
                        <p>
                            Here is an <a href="run_data_structures/run-time-stack.html#nav-illustration" target="_blank">example</a> for a program with a function. We will take up semantic analysis and AST representation before proceeding to code generation.
                        </p>
                        <p>
                            Each function requires a <b>declaration</b>. The declaration of functions must be made along with the global declarations. The declaration of a function must specify the types and names of the <b>formal parameters</b> and the <b>return type</b> of the function. The compiler must store the declaration information in the global symbol table. For example, the declaration
                            <div class="syntax">
                                decl<br>
                                ..<br>
                                ..<br>
                                &emsp;int factorial(int n);<br>
                                ..<br>
                                ..<br>
                                enddecl<br>
                            </div>

                            specifies that <i>factorial</i> is a function that takes as input one integer argument and returns an integer.  This is sometimes called the <b>signature</b> of the function. Conceptually, to invoke the factorial function, the <b>caller</b> must know:

                            <ul>
                                <li id="otis">1. The memory address to which the function call must be directed (<b>binding</b>).</li>
                                <li id="otis">2. The <b>types and names of the formal parameters</b> to the function and the order in which the actual <b>arguments</b> must be given as input to the function.</li>
                                <li id="otis">3. The <b>return type</b> of the function.</li>
                            </ul>

                            This precisely is the information that the symbol table stores.
                        </p>
                        <p>
                            A function <b>definition</b> contains:
                            <ul>
                                <li id="otis">a)  The function's signature.</li>
                                <li id="otis">b)  The declaration of <b>local variables</b> of the function.</li>
                                <li id="otis">c)  The code of the function.</li>
                            </ul>
                            For example, the definition of the factorial function could be as:
                            <div class="syntax">
                                int factorial(int n){<br>
                                &emsp;decl             <br>
                                &emsp;&emsp;int f;       <br>
                                &emsp;enddecl          <br>
                                &emsp;begin<br>
                                &emsp;&emsp;if( n==1 || n==0 ) then<br>
                                &emsp;&emsp;&emsp;f = 1;<br>
                                &emsp;&emsp;else<br>
                                &emsp;&emsp;&emsp;f = n * factorial(n-1); <br>
                                &emsp;&emsp;endif;<br>
                                &emsp;return f;<br>
                                &emsp;end<br>
                                }<br>
                            </div>
                            Local variables declared in a function are <i>visible only</i> within the function.  We say that the <b>scope</b> of a local declaration is limited to the function. Moreover, if a global variable is redeclared inside a function, <i>the local declaration overrides the global declaration</i>.
                        </p>
                        <p>
                            Thus, we have two kinds of variables. Global variables that are visible "everywhere" (or having a <b>global scope</b>) and local variables that are visible only within the functions (or having a <b>local scope</b>) where they are declared.
                        </p>
                        <p>
                            The compiler needs to know the binding addresses and types of the local variables for translation of statements of the function to assembly code.  However, this information is irrelevant outside the function.
                        </p>
                        <p>
                            To keep track of the local variable and scope information, our strategy is to keep global and local variables in different symbol tables.  We will have
                            <ul>
                                <li id="otis">
                                    1.  A single <b>global symbol table</b> storing the <i>(name, type, size, binding)</i> information of global variables as well as <i>(name, type, parameters, binding)</i> information for functions.  The following structure is suggested for storing a global symbol table entry.

                                    <div class="syntax">
                                        <b>struct Gsymbol{</b><br>
                                        &emsp;&emsp;<b>char *name;</b>   //name of the variable or function<br>
                                        &emsp;&emsp;<b>int type;</b>     //type of the variable:(Integer / String)<br>
                                        &emsp;&emsp;<b>int size;</b>     //size of an array<br>
                                        &emsp;&emsp;<b>int binding;</b>  //static binding of global variables <br>
                                        &emsp;&emsp;<b>struct Paramstruct *paramlist;</b>//pointer to the head of the formal parameter list<br>
                                          &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp; //in the case of functions<br>
                                        &emsp;&emsp;<b>int flabel;</b>              //a label for identifying the starting address of a function's code<br>
                                        &emsp;&emsp;<b>struct Gsymbol *next;</b>     //points to the next Global Symbol Table entry<br>
                                        <b>};</b>
                                    </div>
                                </li>

                                <li id="otis">
                                    2.  Several <b>local symbol tables</b> - one for each function containing the <i>(name, type, binding)</i> information for each local variable of the function.  (Note that since the language does not permit arrays to be defined within a function, the size value of local variables is always 1, hence the field is not required).   A local symbol table entry can be stored in the following structure:</li>

                                    <div class="syntax">
                                        <b>struct Lsymbol{</b><br>
                                        &emsp;&emsp;<b>char *name;</b>&emsp;&emsp;&emsp;&emsp; //name of the variable<br>
                                        &emsp;&emsp;<b>int  type;</b> &emsp;&emsp; //type of the variable:(Integer / String)    <br>
                                        &emsp;&emsp;<b>int binding;</b>&emsp;&emsp;&emsp;&emsp; //local binding of the variable<br>
                                        &emsp;&emsp;<b>struct Lsymbol *next;</b>&emsp;&emsp; //points to the next Local Symbol Table entry<br>
                                        <b>};</b>
                                    </div>

                                    <b>Note:</b> As noted in the <a href="run-data-structures.html" target="_blank">run time storage allocation</a> documentation, local variables cannot be assigned static memory addresses.  Hence, the binding of a local variable is a relative address within the function's activation record.  We will discuss this matter in detail later.
                                </li>
                            </ul>
                            </p>
                        <p>
                            What is the <b>stored in flabel</b>?  When the compiler generates code for the function, a label is placed at the start of the function.  A call to the function is translated to a low-level CALL to the corresponding label.   Later, the label has to be replaced by the address  of the instruction during the <a href="label-translation.html" target="_blank">label translation phase</a>.
                        </p>
                        <p>
                            The simple scheme we suggest here is to put the label F0 before the code of the first function declared in the program, F1 before the code of the second and so on.  Hence, in the flabel field of the first function, store 0.  Similarly, store 1 in the flabel field of the second function and so on. A call to the first function must be translated to CALL F0 and so on.
                        </p>
                        <p>
                            With this, the global symbol table for the <a href="run_data_structures/run-time-stack.html#nav-illustration" target="_blank">program</a> would be as below.
                        </p>
                        <p>
                            <img src="img/gsymboltable3.png">
                        </p>
                        <p>
                            Continuing with the above example, we need two local symbol tables – one for the main function and one for the factorial function.  The local symbol table holds the (name, type, binding) triple for each <b>formal parameter</b> as well as <b>local variables</b> of the function.  We will discuss the binding of formal parameters and local variables later.  The local symbol tables of main and factorial would look as the following:
                        </p>
                        <p>
                            <img src="img/localsymboltable2.png">
                            <img src="img/localsymboltable1.png">
                        </p>
                        <p>
                            See <a href="data_structures/local-symbol-table.html" target="_blank">LINK</a> for more details. For now, ignore the type table pointer in the structure given in the link.  This will be discussed in the next stage.
                        </p>
                        <p>

                            <div class="syntax">
                                <b>Task 1</b>:  Complete the program to build the global symbol table.  Test your program by printing out all global declarations in the program by displaying the contents of the symbol table.  You must not permit two variables/functions (or a function and a variable) to have the same name.
                            </div>

                            Our next aim is to perform semantic analysis, build the AST and then generate code. The strategy will be the following:
                            <ul>
                                <li id="otis">1.  First, parse global declarations and create the global symbol table entries for functions andglobal variables (Already completed as Task 1)</li>
                                <li id="otis">2.  For each function for which code is not yet generated
                                    <ul>
                                        <li id="otis">a)  Check for <b>name equivalence</b> of the formal parameters of the function definition with the declaration.  Name equivalence requires that the type and name of each formal parameter of the function in the declaration and the definition must agree.</li>
                                        <li id="otis">b)  Create local symbol table containing local variables and parameters.</li>
                                        <li id="otis">c)  Build AST for the function. (Do type-checking when the tree is being built, as was done in the previous stages.)</li>
                                        <li id="otis">d)  Recursively traverse the tree and generate code for the function in the target file.</li>
                                    </ul>
                                </li>
                            </ul>
                            <b>After generating code for a function, the local symbol table and the abstract syntax tree for the function can be deallocated.</b> Proceed to step 2 for the next function.
                        </p>
                        <p>
                            Note that the code and the local variables of a function are not visible outside the function. Hence, the local symbol table and the AST for a function are not required once the code is generated. <b>To generate code for calling one function from another function, only the global symbol table information of the callee is needed. The global symbol table is maintained throughout the compilation process.</b> Note that the main function is just like any other function except that it has no arguments, its name must be <i>main</i> and return type must be <i>int</i> as per the specification.
                        </p>
                        <p>
                            The only non-trivial part about semantic analysis and AST construction pending discussion is how to build AST for a function call.
                        </p>
                        <p>
                            We start with an example. Suppose we have the following declarations:
                            <div class="syntax">
                                int Compute(int p, int q);   <br>
                                int find(int x);  <br>
                            </div>

                            A call to the function would occur as in:
                            <div class="syntax">
                                t = Compute(a+b,find(a-b));
                            </div>
                            The statement is semantically valid provided a, b, and c  integer type variables.  Note that the call might occur in the code of <i>any function (including the same function – as in the case of a recursive call)</i>.
                        </p>
                        <p>
                            The expression tree for this could look as below:  <br>
                            <img src="img/expression tree.png">
                        </p>
                        <p>
                            A tree node for a function call contains a pointer to a list of expressions, one expression for each argument. <b>The compiler must type check each argument and match it with the type of the corresponding formal parameter of the called function.</b>
                        </p>
                        <p>
                            To get an overall picture of what is going on, you may read the documentation on <a href="data-structures.html" target="_blank"> compile time data structures</a> at this juncture. <b>Ignore type table entries for now</b> as we permit only integer/string variables in the present stage.
                        </p>
                          The expression tree structure given <a href="data_structures/abstract-syntax-tree.html" target="_blank">HERE</a> can be used (again, ignore type table pointer) The details of implementation are left to you.
                        </p>
                        <p>
                            With this information, the task of completing type checking and building expression tree for a function is straightforward.
                            <div class="syntax">
                                <b>Task 2</b>:  Construct AST for each function after checking type and scope rules (semantic analysis).  Right now, we are concerned just with type and scope analysis and not code generation. Note that a variable appearing in a function must first be searched for in the local symbol table of the function and then in the global symbol table, if not found in the local symbol table. Note that local declaration overrides global declaration. The compiler must report an error if the types, names and number of arguments in each function declaration are not matching with the function definition.
                            </div>
                        </p>
                        <p>
                            Now we turn to the code generation step.
                        </p>
                        <p>
                            The dynamics of a function call can be understood easily by dividing the process into the following three stages.
                            <ul>
                                <li id="otis">
                                    Step 1.  <b>Code executed in the calling function (caller)  before the transfer of control</b> to the called function (<b>callee</b>):
                                    <ul>
                                        <li id="otis">a)  The caller must <b>save the registers</b> in use so that even if the callee changes the values later,  the original values can be recovered after return from the callee.</li>
                                        <li id="otis">b)  The caller must <b>evaluate the arguments to the callee and store it</b> somewhere in a way that the <b>callee can access those values</b>.</li>
                                        <li id="otis"> c)  The caller must create some <b>space for the callee to place the return value</b> at the end of the call.</li>
                                        <li id="otis"> d)  The caller must <b>transfer control to the binding address of the callee</b>.</li>
                                    </ul>
                                </li>
                                <li id="otis">
                                    Step 2.  <b>Actions executed by the called function (callee)</b>:
                                    <ul>
                                        <li id="otis">a)  The callee must <b>allocate space for its local variables</b>.</li>
                                        <li id="otis">b)  Execute the callee code.</li>
                                        <li id="otis">c)   When encountering a return instruction, the <b>return value must be computed and stored</b> in the appropriate place specified by the caller (step 1.c above).</li>
                                        <li id="otis">d) <b> Return </b>control to the caller.</li>
                                    </ul>
                                </li>
                                <li id="otis">
                                    Step 3.  <b>Actions executed by the caller</b> after the callee returned control back to the caller.
                                     <ul>
                                        <li id="otis">a)  <b>Recover the return value</b> stored by the callee.</li>
                                        <li id="otis">b)  <b>Deallocate space allocated for arguments</b> for the call in step 1(b).</li>
                                        <li id="otis">c)  <b>Recover the machine registers</b> stored before the call in step 1(a).</li>
                                    </ul>
                                </li>
                            </ul>

                            The machine code for actions in Step 1 and Step 3 must be generated when the compiler encounters a function call in the caller's code.   The compiler generates code for Step 2 while generating code for the callee function.
                        </p>
                        <p>
                            To implement the above plan, we need to create storage space whenever a function call is encountered.   We will be focusing on generating code containing labels.  Translation of labels to actual addresses can be easily done at the end as was done in Stage 4 following the <a href="label-translation.html" target="_blank">Label Translation Documentation</a>.
                        </p>
                        <p>
                            <b>Implementation Strategy</b>. <br>
                            <br>
                            The fundamental strategy for space allocation is to create an <b>activation record</b> for the callee in the <b>stack</b> when a function call is encountered.  <b>The compiler must generate code for creating the activation record at run-time when the call is encountered</b>. Since the storage requirements for arguments, local variables and the return value of a function are known at compile time, the compiler knows exactly how much space must be allocated for each function. We propose following general organization for activation records:  <br>

                            <ul>
                                <li id="otis">
                                    1.  Each <b>activation record must have a base (memory location)</b> which is determined at run time.  The machine register <b>BP (base pointer)</b> is generally used to point to the base of the activation record of the function executing currently.
                                </li>
                                <li id="otis">
                                    2.  <b>Relative to the base, the address of each argument, each local variable, the address where the return value is stored etc are fixed by the compiler statically</b> (at compile time).
                                </li>
                                <li id="otis">
                                    3.  Initially, the activation record for the main function is created in the stack. BP is initialized to the base of this activation record and the main function starts execution.
                                </li>
                                <li id="otis">
                                    4.  <b>If function A calls function B, a new activation record is created in the stack for function B above the activation record of function A</b>. The BP is made to point to the base of activation record of B.  Upon return from B, the activation record of B is popped off the stack and BP is set back to the activation record of A.
                                </li>
                                <li id="otis">
                                    5.  If function A calls function B, the address of the instruction in A to resume execution (<b>return address</b> – value of current-IP +2 in XSM machine- why?) upon return from B must be saved.  Similarly, the <b>base pointer of the caller</b> (BP value) of A <b>must be saved in the stack</b> before BP is changed to point to the base of B.  Both the return address and BP values will be stored in pre-defined locations of the activation record of B.
                                </li>
                                <li id="otis">
                                    6.  In addition to the above, one additional <b>space</b> must be reserved in the activation record of B <b>to store the return value</b>.
                                </li>
                            </ul>

                        </p>
                        <p>
                            A thorough reading of this <a href="run_data_structures/run-time-stack.html" target="_blank">page</a> is <b>absolutely essential</b> to proceed any further. Suppose  a function has n arguments (arg_1, arg_2,...,arg_n) and m local variables (loc_1, loc_2, ..., loc_m), its activation record in the stack may look as the following.(The stack is assumed to grow downwards.)
                        </p>
                        <p>
                            <img src="img/stack.png">
                        </p>
                        <p>
                            In this scheme, the following code must be generated by the caller when a call to the above function is encountered:
                             <ul>
                                <li id="otis">1.  Generate code to <b>push registers</b> in use into the stack.  After this, the callee's activation record begins.</li>
                                <li id="otis">2.  Evaluate arg_n and push the value to the stack.  Now evaluate and push arg_n-1 and so on till arg_1.  The <b>arguments are pushed in reverse order</b>.  (You can do it in any order as long as the same convention is followed everywhere).</li>
                                <li id="otis">3. <b>Push one empty space</b> in the stack for the callee to store the <b>return value</b>.</li>
                                <li id="otis">4. Generate <b>Call instruction</b> to the binding (label) of the function. (The call instruction will push IP+2 into the stack and jump to label.)</li>
                            </ul>
                        </p>
                        <p>
                            <div class="col-full">
                                <div class="col-one-half fleft">
                                    Stack before the call instruction :<br>
                                    <img src="img/before call.png"><br><br>
                                </div>
                                <div class="col-one-half fright">
                                    Stack after the call instruction :<br>
                                    <img src="img/after call.png">
                                </div>
                            </div>
                            <br>
                            <br>
                            <br>
                        </p>
                        <p>
                            <br>
                                Figure: Actions in the stack done by the caller before the call <br><br><br>
                            <b>Once the call is made, the next instruction in the caller will be executed only after the callee executes a return statement</b>.  The caller must proceed from here assuming that <b>the callee would have placed the return value</b> in the location in the stack designated for it. The caller must generate code to extract the return value and clean up the stack.
                             <ul>
                                <li id="otis">5. Allocate a new register and <b>store the returned value</b> into the register.</li>
                                <li id="otis">6. <b>Pop out arguments</b> from the stack. (The arguments may be discarded now.)</li>
                                <li id="otis">7. <b>Restore registers</b> saved in the stack in step 1.</li>
                            </ul>

                            The above actions are sufficient to generate code for handling a function invocation.  Note that to generate code for a call, only the callee's declaration information (argument information and call label) needs to be known. All that the caller needs to know is the callee's <b>interface</b>, and the <b>calling convention</b> (convention regarding in what order arguments must be pushed, where should the return value be stored etc.  See also <a href="https://en.wikipedia.org/wiki/Calling_convention" target="_blank">link</a>).
                        </p>
                        <p>
                            Now, we turn to the actions to be done by the callee. The callee must generate the following code before proceeding to the remaining instructions:
                            <ul>
                                <li id="otis">1.<b>Save the BP of the caller</b> by pushing the BP register into the stack.</li>
                                <li id="otis">2.<b>Set BP</b> to the present value of SP register.</li>
                                <li id="otis">3. Push enough space in the stack for storing the local variables.</li>
                            </ul>

                            Relative to the BP value set in step 2 above, [BP-2] is the address to which the return value must be stored. [BP-3] stores arg_1, [BP-4] stores arg_2 and so on.  [BP+1] is for loc_1, [BP+2] for loc_2 and so on. Thus, after seeing the local variable declarations, the compiler can set the binding values for local variables relative to the base of activation record (BP) value as:
                        </p>
                        <p>
                            <img src="img/var-bind table.png">
                        </p>
                        <p>
                            With this convention, the code for each instruction inside a function can be generated following the rule:  <b>local variables/arguments are to be dereferenced a by adding the binding value to the contents of the BP register</b>.  Almost every modern architecture supports function calls by providing an explicit base pointer register.  <br>
                            <br>
                            Finally, the code for a return statement must:<br>
                            <ul>
                                <li id="otis">1.  <b>Pop out the local variables</b> from the stack.</li>
                                <li id="otis">2.  Calculate the return expression and store the value in [BP-2].</li>
                                <li id="otis">3.  <b>set BP to the old value</b> of BP in the stack.</li>
                                <li id="otis">4.  <b>Execute the RET instruction</b> to pass control back to the caller.</li>
                            </ul>
                            <br>
                            In the calling convention which we described above, the arguments were pushed in reverse order, space for the return value was allocated by the caller, BP of the caller was saved to the stack by the callee, and so on.   Space created in the stack by either the callee or the caller must be eventually reclaimed by the same party.
                        </p>
                        <p>
                            <b>IMPORTANT NOTE</b>:  In our calling convention, the caller was required to save the registers in use before the call.  What if instead, we design a  calling convention where the callee had to push the registers in use?  The problem here is that the callee does not know (and does not have to know) the caller and hence do not know at each call which were the registers in use.  Hence, the callee will have to save all machine registers, wasting time and space.  Hence, the convention of the caller storing registers in use is superior.  However, there are situations where this is not possible.  For instance, in hardware interrupt routines,  control is transferred to the callee without the caller executing a call.  In such cases, the callee will have to save (all) the machine registers for a successful return.
                        </p>
                        <p>
                            You have enough background now to complete the final task of this stage.<br>

                            <div class="syntax">
                                <b>Task 3</b>:  Complete code generation for functional calls.  <br>
                            </div>

                            <b>Exercise 1</b>:  Modify the function semantics to permit pointer type variables of <a href="#stage4_ex2" target="_blank">Stage 4 (Exercise 2)</a> to be passed as arguments to functions.  This will allow a function to pass the address of a local variable to another function as an argument so that the callee can modify the contents.   Modify the syntax and semantics rules appropriately.  This feature must allow you to write functions like:<br>
                            <br>
                            <i>int</i> swap<i>(int</i> *p, <i>int</i> *q<i>)</i> <br>
                            <br>
                            Note that functions need to be permitted to return pointer type variables.  (Returning a pointer to local variable from a function is not advisable – Why?).
                        </p>
                        <p>
                            <b>Exercise 2</b> (Hard work, but insightful):  Suppose you want to extend the language with facility of <b>tuples</b>.  By a tuple, we mean an object declared as below:<br>

                            <div class="syntax">
                                decl<br>
                                ..<br>
                                ..<br>
                                &emsp;tuple tnme(type fname_1, type fname_2, .... ,type fname_n) var_1, var_2 ..  var_k;  <br>
                                ..<br>
                                ..<br>
                                enddecl<br>
                            </div>

                            Note that tuple is introduced as a new keyword.  For example, we could have:<br>
                            <div class="syntax">
                                decl<br>
                                <br>
                                 &emsp;tuple student (str name, int roll_no, str branchname, int year_of_admission) a, b, c, *sptr; <br>
                                 &emsp;tuple faculty (str name, int employee_id, str dept) x, y, z, *fptr;  <br>
                                 <br>
                                enddecl<br>
                            </div>

                            To access a tuple you must introduce the "." operator.  Here is an example:<br>

                            <div class="syntax">
                                read(a.name);<br>
                                read(x.name); <br>
                                if (a.name == x.name) then <br>
                                &emsp;write("They have same names"); <br>
                                endif; <br>
                            </div>

                            You must also permit assignment of a tuple type variable to another, provided the variables are of the same tuple type.  <br>
                            <br>
                            Design the syntax and semantics rules, make necessary modifications to the lexer, parser, symbol table and AST structures to incorporate the addition of tuples and change the code generation module accordingly.    For now, assume that tuples cannot be passed as arguments to functions or be returned by functions.   However, you must permit local tuple declarations.  Note that you have considerable freedom in deciding on the grammar rules and data structures, and even the features permitted. <br>
                            <br>
                            <b>Exercise 3</b>:  (Hard work, optional, but insightful.  Can be done only after Exercise 1 and Exercise 2 are completed).<br>

                            Allow tuples and pointers to tuples to be passed as arguments to functions.  (Allowing the whole tuple to be passed creates more work in parameter passing, though not difficult in principle).   Functions may be permitted to return tuples as return values.   Permit a function that takes an argument/returns a tuple or pointers to tuple type to be declared only after the declaration of the concerned tuples (this is to avoid the <a href="https://en.wikipedia.org/wiki/Forward_declaration" target="_blank">forward reference problem</a>).  Design syntax, semantics and code generation strategies appropriately.
                        </p>
                      <!--  <p>
                          Check your implementation with the following test cases : <br>
                          1) <a href="testprograms/test3.html" target="_blank">Test Case 1</a> <br>
                         2) <a href="testprograms/test4.html" target="_blank">Test Case 2</a> <br> -->
                    <!--      2) <a href="testprograms/test5.html" target="_blank">Test Case 3</a> <br>
                          3) <a href="testprograms/test6.html" target="_blank">Test Case 4</a> <br>
                          4) <a href="testprograms/test7.html" target="_blank">Test Case 5</a> <br>
                        </p> -->

                        <h4> <b>Test Programs </b></h4>
                        <p>
                          Check your implementation with the following test cases : <br>
                          <a href="testprograms.html#test3" target="_blank">Test Program 1 : Bubblesort (recursive)</a> <br>
                          <a href="testprograms.html#test5" target="_blank">Test Program 2 : Factorial (recursive)
</a> <br>
                          <a href="testprograms.html#test6" target="_blank">Test Program 3 : Quicksort (recursive)</a> <br>
                          <a href="testprograms.html#test7" target="_blank">Test Program 4 : Constant Program (recursive)</a> <br>
                          <a href="testprograms.html#test10" target="_blank">Test Program 5 : Fibonacci (recursive)</a> <br>
                          <a href="testprograms.html#test2" target="_blank">Test Program 6 : Extended Euclid(with a Function)</a> <br>
                          <a href="testprograms.html#test1" target="_blank">Test Program 7 : BubbleSort (iterative)</a> <br>
                          <a href="testprograms.html#test12" target="_blank">Test Program 8 : Extended Euclid(iterative)</a> <br>
                        </p>

                            <div class="up column3 mright">
                                <a href="#navtop">  &#x2191; </a>
                            </div>
                    </article>
                    <article class="grid col-full" id="nav-stage6">
                        <h2>Stage 6: User defined types and Dynamic Memory Allocation</h2>
                           <b>Time Estimate :</b> 2 weeks, 5-10 hours/week
                        <p>
                            <b>Prerequisite Reading</b>:
                            <ul>
                                <li id="otis">1.  Read the <a href="expl.html" target="_blank">ExpL specification</a>.</li>
                                <li id="otis">2.  Read about <a href="run_data_structures/heap.html" target="_blank">Dynamic memory allocation</a>.</li>
                            </ul>
                        </p>

                            <!--done -->
                        <p>
                            <b>Learning Objectives</b>:
                            <ul>
                                 <li id="otis">You will extend the language of Stage 5 by adding support for <b>user-defined types</b> and <b>dynamic memory allocation</b>.  Issues of <b>Heap management</b> will be encountered en route.</li>
                            </ul>
                            <hr>
                        </p>

                        <!--done-->

                        <p>
                            This is the second major stage of the ExpL compiler project and will be implemented in two parts.  In the first part, we will see how user defined types can be added to the language syntax and how semantic analysis can be performed.  The ExpL specification demands storage for user-defined types dynamically.   We will discuss how dynamic memory allocation can be achieved in the second part.
                        </p>

                        <!--done-->
                        <p>
                            See the <a href="expl.html" target="_blank">ExpL language specification</a> for an informal description of the language.   It is suggested that you design your own grammar using the outline provided <a href="grammar-outline.html" target="_blank">here</a> as a reference. The following <a href="testprograms.html#test8" target="_blank">link</a> provide examples of ExpL programs containing user defined types.
                        </p>
                        <p>
                            We will now take up the front end - semantic analysis and AST representation - before proceeding to code generation and dynamic memory allocation.    <br>
                             <br>
                            <b>Part I</b>:  <b>Front End</b> <br>
                            <br>
                            Every user defined type requires a <b>type definition</b>.  Type definitions are placed at the beginning of a program, ahead of global declarations.  A user-defined type in ExpL essentially defines an <b>aggregate type</b>.    The <b>member fields</b> of a user defined type may have arbitrary types (subject to certain constraints – to be discussed soon). <br>
                            <br>

                            Consider the type definition:  <br>
                            <div class="syntax">
                                type<br>
                                &emsp;bst{<br>
                                &emsp;&emsp;int a;<br>
                                &emsp;&emsp;bst left;<br>
                                &emsp;&emsp;bst right;<br>
                                }<br>
                                endtype<br>
                                decl<br>
                                &emsp;int in,opt;<br>
                                &emsp;bst insert(bst h, int key);<br>
                                &emsp;int inOrder(bst h); <br>
                                &emsp;int preOrder(bst h);<br>
                                &emsp;int postOrder(bst h); <br>
                                enddecl<br>
                            </div>
                            This type definition specifies the node structure for a binary search tree.  The member field a has type integer, whereas <i>left</i> and <i>right</i> have type bst.   Note the <b>recursive</b> nature of the type definition.   The declaration section shows functions which take as input a <i>bst</i> or returns a <i>bst</i>.    Here is another type definition: <br>
                            <div class="syntax" id="nav-marklist">
                                type<br>
                                &emsp;linkedList {<br>
                                &emsp;&emsp;int data; <br>
                                &emsp;&emsp;linkedList next;  <br>
                                  }<br>
                                  <br>
                                &emsp;markList{<br>
                                &emsp;&emsp;str name;  <br>
                                &emsp;&emsp;linkedList marks; <br>
                                  }<br>
                                endtype<br>
                                decl<br>
                                &emsp; markList mList,temp; <br>
                                enddecl<br>
                            </div>

                            Note here that in the type <i>markList</i>, the member field <i>marks</i> is of the type <i>linkedList</i>.  ExpL stipulates that the member fields of a user defined type, if not of type integer or string, can only be of the <b>same</b> type or one of a <b>previously defined</b> type.
                        </p>
                        <p>
                            The compiler must keep track of the type definitions in some data structure.  For this purpose, we will maintain a <b>type table</b> storing the type definition information.  <b>Each user defined type will have a type table entry</b>.  In addition to user-defined types, the type table will also store "default" entries for <i>int</i>, <i>str</i>, <i>bool</i> and <i>void</i> type.   (Since logical expressions evaluate to a boolean value, they may be assigned boolean type.  The ExpL constant NULL can be assigned to a variable of any user-defined type.  Hence, having a NULL type is useful from a purely implementation perspective.  Note here that boolean is an <b>implicit type</b> in the language.  The language does not allow the programmer to declare a variable of type boolean).  <br>
                            <br>
                            <b>The type table entry for a user-defined type must provide information about the names and types of its member fields</b>.  For each member field, a pointer to its type table entry must be maintained.  This <a href="data_structures/type-table.html" target="_blank">link</a> gives you a simple type table implementation scheme.  (You have to fill in missing details).  <br>
                            <br>
                            Symbol table must also be modified to handle user-defined-types information.   The <b>type field of the symbol table entry of a variable/function</b> shall <b>refer to the type table entry of the corresponding type</b> (Recall that in the case of a function, the type of a function is its return type).  The following <a href="data_structures/global-symbol-table.html" target="_blank">link</a> illustrates the organization of the global symbol table.  The type entry of each formal parameter of a function must also refer to the corresponding type table entry.   <a href="data_structures/local-symbol-table.html" target="_blank">Local symbol tables</a> of functions will require similar modification.  <br>
                            <br>
                            The next question is how to assign memory for user defined type variables.  We will defer this issue temporarily and hence, for now, will not discuss how to assign bindings to variables of user-defined types right now.   This will be discussed in Part II.<br>
                            <br>
                            We must now discuss how to use the symbol table and type table for completing semantic analysis of the input program.   Let us look at an example.  <br>
                            <br>
                            Consider the declaration of the type <i>markList</i> in the example <a href="#nav-marklist">above</a>.  The language now permits statements like:<br>

                            <div class="syntax">
                            temp = mList.next; <br>
                            if (mList.marks.data > mList.next.marks.data) then <br>
                            &emsp;write("first student performed better in the first subject"); <br>
                            endif;<br>
                            </div>

                            <b>Note that the operands in expressions can now be member fields of user-defined-type variables</b>.  Similarly, the left side of an assignment statement or the variable for a read statement can now be a member field.  The grammar rules for various statements in the language are outlined <a href="grammar-outline.html" target="_blank">here</a>.   You must try to design your own grammar, keeping the grammar above as a guideline.  Many details are (deliberately) left out in the outline given to you.<br>
                            <br>
                            In the following, we use the term <b>field</b> generically to refer to <i>any</i> member field of <i>any</i> variable (of any user-defined type).  <br>
                            <br>
                            What must be the type of a field?  The type of <i>mList.next</i> in the above example must be the type of the member field <i>next</i> of the user defined type <i>markList</i>.  Once this information is extracted from the symbol table, the type of any statement, expression or variable can be determined correctly.   Thus, <b>an assignment statement is valid provided the types of the right side expression matches with that of the left side variable</b>.  <i>The only exception to this rule is that the constant NULL can be assigned to any variable of any user-defined type.</i> <br>
                            <br>
                            Stated formally,
                            <div class="syntax">
                            Field :: = Field '.' ID  {  $$.type = $3. type; }   <br>
                            &emsp;&emsp;&emsp;&emsp;&emsp;&emsp;&emsp;             | ID  '.' ID { $$.type = $3.type; } <br>
                            </div>

                            While constructing the AST,  the type entry for any field may be recursively computed using the formal rule noted above.   This information can be used for effective semantic analysis.  Arguments to functions must also be type checked against formal parameters of their declaration.
                        </p>
                        <p>
                            A plausible tree node structure and associated functions for AST construction are described <a href="data_structures/abstract-syntax-tree.html" target="_blank"> here</a>.
                        </p>
                        <p>
                            <b>Note</b>:  ExpL specification demands a rigid type analysis by <a href="https://en.wikipedia.org/wiki/Nominal_type_system" target="_blank">name equivalence</a>. This differs from the more liberal <a href="https://en.wikipedia.org/wiki/Structural_type_system" target="_blank">structural equivalance</a> in the C programming language.
                        </p>
                        <p>
                            With this background, the front end of the ExpL compiler can be completed.<br>
                            <br>
                                <b>Task 1</b>:  Complete the syntax and semantic analysis and construct AST for ExpL language. (<a href="expl.html" target="_blank">Specification</a>, <a href="grammar-outline.html" target="_blank">Grammar Outline</a>)
                            <br>
                            <br>
                        </p>
                        <p>
                            <b>Part II</b>:  <b>Back End</b>
                        </p>
                        <p>
                            <br>

                            We will first discuss the underlying concepts before getting into the back-end implementation details.<br>
                            <br>
                            The ExpL specification stipulates that the storage for a variable of a user-defined-type is allocated through the <a href="expl.html#nav-user-defined-types" target="_blank"><i>Alloc()</i></a> function. Each user-defined-type variable in ExpL must store the <b>reference</b> to its actual memory store.  <b>The actual memory may be allocated</b> by <i>Alloc()</i> when a call to the function is encountered <b>at run time</b>.
                        </p>
                        <p>
                            From an implementation point of view, <b>a variable of a user-defined-type must be designed to hold the address of its actual memory store</b>. Whenever the <i>Alloc()</i> function is invoked for the variable,  a memory region sufficient for holding all the member fields of the variable must be allocated "somewhere" in memory and Alloc() must return the starting address of that memory region.  <b>The compiler must generate code to invoke <i>Alloc()</i> and store the return value into the variable</b>.    Thus, <b>the variable will essentially store a pointer</b> to the memory region allocated by <i>Alloc()</i>.  This is easy to do, provided we have the <i>Alloc()</i> function at our disposal.
                        </p>
                        <p>
                            <b>A variable of a user-defined-type must be allocated at compile time one memory word to store an address (basically an integer) returned by <i>Alloc()</i> at run time</b>.  The allocation could be static or run-time.    ExpL specification does not permit arrays of user-defined type.
                        </p>
                        <p>
                            Once the memory is allocated for a user-defined type variable, member field references can be translated easily.  The details are left to you.
                        </p>
                        <p>
                            The next problem is to design and implement the <i>Alloc()</i> function.   We will also take up the issue of designing the <i>Free()</i> function (to de-allocate some previously allocated memory).  This problem is known as the <b>dynamic memory allocation problem</b>.  (The <i>malloc()</i> and <i>free()</i> functions of the C library are <a href="https://en.wikipedia.org/wiki/C_dynamic_memory_allocation" target="_blank">dynamic memory allocation</a> routines of the C programming language.)
                        </p>
                        <p>
                            The strategy of Alloc() is to maintain a <b>memory pool</b> called <b>heap memory</b>.
                            Alloc() will need to manage a large run-time memory pool.  To make matters simple, we will assume that Alloc() divides the whole memory into fixed size blocks, each of size eight words.  In this case, the strategy is very simple:<br>

                            <div class="syntax">
                            1. Before the start of the program, reserve a large area of the address space for heap. The ExpOS <a href="abi.html#nav-virtual-address-space-model" target="_blank">memory model</a> suggests that the address region 1024-2047 may be used for this purpose.<br>
                            2. Organize the heap into a linear linked list of blocks of size 8. We will design a heap initializer function <i>Initialize()</i> specifically for this.<br>
                            3. When an <i>Alloc()</i> request comes, return the start address of the first free block in the list (and remove the block from the free list).<br>
                            4. When a <i>Free(address)</i> request comes (assuming <i>address</i> refers to the start address of a block already allocated), return the block pointed to by <i>address</i> back to the memory pool.<br>
                            </div>

                            The <i>pragmatic</i> restriction imposed by such a simple implementation is that a user-defined type cannot have more than eight-member fields.  Of course, we could have increased the block size to – say sixteen – in which case the number of member fields can be upto 16.  For now, we will be content with the simple fixed block size scheme.
                        </p>
                        <p>
                            With this, we can design our compiler to generate code for <i>Alloc()</i>, <i>Free()</i> and <i>Initialize()</i> functions along with the target code.   Techniques of Stage 5 suffices to build an executable file.
                        </p>
                        <p>
                            However, there is a better way of doing things when OS support for shared library is available.
                            Note that we have been using the shared library for console input, output etc.
                            <b>Since the routines <i>Alloc()</i>, <i>Free()</i> and <i>Initialize()</i> will be used by <b>every</b> ExpL program
                            (that uses user-defined types), it would be profitable to write these routines once
                            and add them as part of the library.</b>  Since the OS loader loads the
                            shared library to the memory region 0-1023 of the address space of each program,
                            the code for <i>Alloc()</i>, <i>Free()</i> and <i>Initialize()</i> too will be available to the program.
                            The library interface must be defined so that the calling conventions for invoking each of the above
                            functions are clearly specified.
                          </p>
                          <p>
                            With this strategy, when an ExpL program
                            is compiled, the compiler will not generate code to implement <i>Alloc()</i>, <i>Free()</i> or <i>Initialize()</i>.
                            <b>Instead, the compiler will generate a CALL to the library (using the
                            library interface) with appropriate arguments so that the corresponding library routine
                            is invoked.  You need to modify the library code so that the assembly code for <i>Alloc()</i>, <i>Free()</i> and <i>Initialize()</i> are added to the library.</b>
                            The advantage of this method is that
                               the code implementing <i>Alloc()</i>, <i>Free()</i> and <i>Initialize()</i> need not be part of the the code of every
                               executable program, saving both load-time and system memory.  The technical jargon calls
                               such a library a run time loading library.
                        </p>


                        <p>
                            The <a href="abi.html" target="_blank">ABI</a> stipulates that calls to the dynamic memory functions shall be directed
                             through the Library.   Thus, you must add Alloc(), Free() and Initialize() as library functions.
                              As noted previously, the region of memory between address 1024 and 2047 must be used for heap memory allocation.
                        </p>
                        <p>
                            It is <b>absolutely necessary</b> to read and understand <a href="run_data_structures/heap.html" target="_blank">Dynamic memory allocation</a> (<b>except</b> Buddy System Allocation) to proceed further.  An overall picture of the ExpOS library design is outlined in the <a href="library-implementation.html" target="_blank">library implementation</a> documentation. We are now ready to complete the back end.
                        </p>
                        <p>
                            <b>Task 2</b>:  Complete the back-end adding <i>Alloc()</i>, <i>Free()</i> and <i>Initialize()</i> functions to the ExpOS library and complete the implementation of adding user defined types to ExpL.   Assume fixed block size of 8 words for memory allocation.  The compiler must flag an error "too many member fields" if a user-defined type definition has more than 8 member fields.  Note that the fixed block allocator is pretty simple to be written directly in assembly language. Read the note below before proceeding with the implementation.
                        </p>
                        <p>
                            <b>Important Note</b> : Your library functions will need to modify registers and hence before a library call,   ensure that registers in current use are saved in the stack (as was done with function calls in the previous stage).
                        </p>
                        <p>
                            <b>Exercise 1</b>:  Extend ExpL to permit arrays of user-defined type.
                        </p>
                        <p>
                            If you want to do <b>variable sized block allocation</b>, more complex allocation schemes like the <a href="run_data_structures/heap.html#nav-buddy-allocation" target="_blank"><b>Buddy System Algorithm</b></a> will be required.  One would also need to understand the issue of <a href="https://en.wikipedia.org/wiki/Fragmentation_(computing)" target="_blank"><b>memory fragmentation</b></a> that can arise when variable sized allocation is done.
                        </p>
                        <p>
                            <b>Exercise 2</b>:   (Optional)  Modify <i>Alloc()</i> and <i>Free()</i> library functions to implement the </b>Buddy memory allocator described <a href="run_data_structures/heap.html#nav-buddy-allocation" target="_blank">here</a>.  You will have to modify <i>Initialize()</i> appropriately.   The buddy system allocator is too complex to write in assembly language.  Hence write them in ExpL itself and modify your label translation scheme to generate target addresses correctly.<br>
                        </p>
                      <!--  <p>
                          Check your implementation with the following test cases : <br>
                          1) <a href="testprograms/test8.html" target="_blank">Test Case 1</a> <br>
                          2) <a href="testprograms/test9.html" target="_blank">Test Case 2</a> <br>
                          3) <a href="testprograms/test10.html" target="_blank">Test Case 3</a> <br>
                          4) <a href="testprograms/test11.html" target="_blank">Test Case 4</a> <br>
                        </p> -->

                        <h4> <b>Test Programs </b></h4>
                        <p>
                          Check your implementation with the following test cases : <br>
                          <a href="testprograms.html#test8" target="_blank">Test Program 1: Linked List</a> <br>
                          <a href="testprograms.html#test9" target="_blank">Test Program 2: Binary Search Tree</a> <br>
                          <a href="testprograms.html#test11" target="_blank">Test Program 3: Extended Euclid Algorithm using linkedlist</a> <br>
                          <a href="testprograms.html#test4" target="_blank">Test Program 4: Extended Euclid Algorithm using Userdefined types</a> <br>
                        </p>
                        <div class="up column3">
                            <a href="#navtop">&#x2191; </a>
                        </div>
                    </article>

                    <!--

                    <article class="grid col-full" id="nav-stage7">
                      <h2>Stage 7: Register Allocation</h2>
                      <p>
                          <b>Learning Objectives</b>:
                          <ul>
                               <li id="otis">You will add a very primitive <a href="https://en.wikipedia.org/wiki/Register_allocation" target="_blank">
                               register allocation</a> scheme to compiler to make the target code generated by Stage 6 more efficient.</li>
                          </ul>
                          <hr>
                      </p>
                      <p>
                        Instructions whose operands are registers run much faster than instructions involving operand fetch from memory (why?). Hence, in order to make the target code run faster, variables could be stored in registers, whenever free registers are available.
                    </p>
                    <p>
                      If you observe the target code generated in Stage 4, you will see that only 3-4 registers are used for storage of intermediate results of expression evaluations, almost all the time. This leaves many registers unused, which could be utilized for storing variables.</p>

                    <p>
                      We will use the following strategy.  When a variable is to be read/assigned by a statement inside a function, the variable will be bound to a temporary register.  This binding will be <b>retained even after the use</b>. This allows later program statements that read/write the same variable to access/update the value in the register, avoiding a memory access.
                    </p>

                    <p>
                      Since temporary registers are few in number, we need to implement a register replacement policy that “caches out” some variables back to memory and re-binds registers to other variables when one runs out of registers.  The details are outlined below.
                    </p>

                    <p>
                      Add the following logic to the compiler of the previous stage.
                        <ol>
                             <li>
                             For normal computations done in the previous stages involving <i>getReg()</i> and <i>freeReg()</i> functions,<b> reduce the free register pool to R0-R5
                             </b>.(Consequently, if an expression is too long requiring more than 6 registers for storing intermediate values, the compiler must flag an error indicating that it encountered an expression that was too long to handle).
                             </li>
                             <li>
                             <i>Temporary pool</i>:  Reserve registers  <b>R6-R19 as a separate register pool for temporary storage</b> of variables(We will refer to these registers as “temporaries”).  We will use this register pool as a “cache” where values of program variables (except for array type variables) will be stored for fast access.  We will design new functions <i>getTemp()</i> and
                              <i>freeTemp()</i> to allocate and de-allocate temporaries from this pool.  We will call this register pool the <b>temporary pool</b>.
                            </li>
                             <li>
                             <i>Modifications to Symbol Table structures:</i>  When the compiler encounters the declaration of a global/local variable, store is allocated in the normal way.   Temporaries are allocated only when a variable is accessed and not when the variable is declared.   Additionally, <b>in the global symbol table as well as local symbol tables of each function, the compiler has to maintain two fields</b> <i>temp</i> <b>and</b> <i>reference_count</i> <b>corresponding to each variable</b> [LINK].
                               <p>

                               a) The field <i>temp</i> will store the index of the temporary register allocated for the variable by the compiler.   If no temporary is assigned to the variable, the temp field shall be set to 0.
                               <br>
                               b) The field <i>reference_count</i> of a variable will indicate the number of future (static) references to that variable within a function.  This is a heuristic used to identify variables whose allotted temporary register is to be deallocated, in the event of a conflict for temporary space.  The use of this field will be explained later.
                               </p>
                             </li>
                             <li>
                               <i>Temporary Allocation and Release</i>:   Register allocation is performed by two functions – <i>freeTemp()</i> and <i>getTemp</i>().
                               <p>
                                a) The function <i>freeTemp(argument:  variable_name)</i> releases the temporary register associated with the variable varname  (if any) and adds the temporary register to the free pool. <b> The function must generate code to store the contents of the temporary to the memory address of the variable</b>.    The function must also record the release of the temporary by setting the <i>temp</i> field to 0 in the symbol table entry of the variable.    <b>The function returns the index of the temporary register that was released</b>.
                                <br>
                                b) The <i>getTemp(varname)</i> function checks whether a temporary has been already allocated for the variable passed as argument (inspecting the <i>temp</i> field in the symbol table entry for the variable).   If so, the function simply returns the index of the register.  Otherwise, the following steps must be done to allocate a temporary:
                                <br>
                                Check whether an un-allocated temporary is available in the temporary pool.<br>
                                 If available,
                                 <br>
                                 {
                                  <br>
                                   &nbsp;&nbsp;&nbsp;&nbsp; Allocate an un-allocated temporary to the variable by setting the temp field in the   symbol table. Generate code to transfer the contents of &nbsp;&nbsp;&nbsp;&nbsp;the memory location storing the variable to the temporary and return the index of the temporary register.
                                   <br>
                                 }
                                 <br>
                                 else
                                 <br>
                                 {
                                  <br>
                                  &nbsp;&nbsp;&nbsp;&nbsp;1)Select a variable to be “<i>cached out</i>.”  A simple replacement policy is to <b>replace any variable with the lowest value of</b> <i>reference_count</i> and &nbsp;&nbsp;&nbsp;&nbsp;release its temporary by invoking the <i>freeTemp()</i> function (We will soon discuss how to assign reference count to variables).<br>

                                  &nbsp;&nbsp;&nbsp;&nbsp;2)Allocate the released temporary register to the variable by setting the temp field in the symbol table.  <b> Generate code to load the contents &nbsp;&nbsp;&nbsp;&nbsp;of the variable from memory to the temporary register</b> (See also Exercise 1 below) and return the register index.
                                  <br>
                                 }

                                 </p>
                             </li>
                             <li>
                              <i>  Changes to the code generation Phase</i>:  Since <i>getTemp()</i> returns a register holding the variable, it is not longer required to transfer contents of the register to another register for accessing values.  Similar changes are required when a variable is assigned a value.
                             </li>
                             <li>
                              <i>Simplifying Assumptions</i>:   To simplify the implementation, we assume the following:
                              <br>
                                 a)<b>Temporary allocation will be made afresh inside each function</b>. <i>Before a function call is made, all temporaries allocated by the caller must be pushed to the stack and popped back upon return</i>. Before a function returns, all temporary allocations made by the function are released.  (If a global variable is assigned a temporary register by the function, then code must be generated to store back the temporary to the memory location of the corresponding variable before return).
                                <br>

                                 b)<b>Temporary allocation will not be done for arrays</b>.
                                <br>
                             </li>
                             <li>
                               <i>Assigning reference_count</i>:  Reference count for each variable will be computed incrementally during abstract synatax tree construction phase for the function and these values will be decremented during each access to the variable during the code generation phase.  The steps are detailed below.
                               <p>
                               a) While processing declarations in a function, reference_count for all variables visible to the function (global and local) is initalized to zero.
                                </p>
                               <p>
                                b)  For each function in the program, during the Abstract Syntax Tree construction phase for a function, each time when a reference to a variable is encountered in the function (that is, for each symbol table search to the variable), the reference count for the variable is incremented.    Hence, at the end of the AST construction phase, the reference count will indicate the total number of times the variable was accessed in the program.
                                 </p>
                               <p>
                                c) During code generation, each time a variable is accessed, <i>reference_count</i> is decremented.  Hence, <i>reference_count</i> will indicate the number of times the variable will be accessed subsequently during the function's compilation.   Hence, if <i>reference_count</i> reaches zero, there are no static (compile time) references for the variable to be encountered further in the function.
                                 </p>
                             </li>

                        </ol>
                        <hr>
                    </p>

                    <p>
                        <b>Task 1</b>:  Implement the register allocation scheme as outline above.
                    </p>
                    <p>
                        <b> Important Note: </b> The fact that the reference to a variable at a program point is the last encounted during compilation of a function <b>by no means ensure</b> that the variable will not be accessed during run time after execution of the particular statement within the function (figure out why?)  Hence, <i>reference_count</i> is only a very naive heuristic for predicting  the future use of a variable (see exercises below).   You are advised to read standard text books on compiler design to gain further insight into various register allocation techniques in the literature.  Going further into these details is beyond the scope of this project.  However, we just note that the problem of allocating registers optimally is computationally hard [REF] and hence any solution to the problem will necessarily rely on heuristics.
                    </p>

                    <p>
                        <i>Exercise 1</i>:  The present <i>getTemp()</i> function is inefficient during assignment of a new value to a variable because, it is wasteful to transfer its existing value from the memory to it's temporary register.  Modify the temporary allocator interface as:  <i> getTemp(variable_name, load_flag)</i>, where <i>load_flag=1</i> only if the variable's existing value must be loaded to the temporary, and  <i>load_flag=0</i> otherwise.  Modify the code generation phase to eliminate the inefficiency mentioned above.
                     </p>

                </article>

              -->

                <article class="grid col-full" id="nav-stage7">
                         <h2>Stage 7: Adding Objects – Data encapsulation</h2>
                            <b>Time Estimate :</b> 2 weeks, 5-10 hours/week
                        <p>
                            <b>Prerequisite Reading</b>:
                            <ul>
                                <li id="otis">1.Read the <a href="oexpl-specification.html" target="_blank"> OExpL specification</a> (except for the section on inheritance, which may be read before the next stage).</li>

                            </ul>
                        </p>

                        <p>
                            <b>Learning Objectives</b>:
                            <ul>
                                 <li id="otis">You will extend the language of Stage 6
                                   by adding support to classes (except for inheritance) to make
                                   ExpL an <a href= "https://en.wikipedia.org/wiki/Object-based_language" target="_blank">  <i>object based language</i></a>.
                                   <!--You will also extend ExpL with support for splitting a source program into multiple class files and one main program.--></li>
                            </ul>
                            <hr>
                        </p>

                         <p>
                            In this stage, we will extend ExpL to provide support for <a href="https://en.wikipedia.org/wiki/Data_encapsulation" target="_blank">data encapsulation</a> by supporting classes. <a href="https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming)" target="_blank">Inheritance</a> will be added in the next stage.
                        </p>

                         <p>
                            From an implementation view-point, a class is similar to a user defined type that, apart from <i>member fields</i>, also contain <i>member functions</i> or <b>methods</b>. The access rules of class members and methods are more stringent. Read the (<a href="oexpl-specification.html" target="_blank">OExpL specification </a> carefully for access semantics.) Here, we will focus on how support for classes can be added to the compiler.
                        </p>

                             <br>
                             <b>Part I</b>:  <b>Syntax and Semantic Analysis</b> <br>
                             <br>

                        <p>
                          <!--  The structure of class definitions for this stage is outlined below:-->

                            The grammar outline for class definitions is given <a href="oexpl-grammar-outline.html" target="_blank">here</a>.
                            The following grammar rule allowing class extension (in Class Definitions)
                            will not be supported in this stage.

                            <div class="syntax">
                              <!--  ClassDefs ::= CLASS classDefList ENDCLASS <a href="oexpl-grammar-outline.html" target="_blank">LINK</a><br>
                                <br>
                                classDefList ::= oneClass | classDefList oneClass<br>
                                <br>
                                oneClass ::= .... /* to be added */-->

                                Cname         : ID Extends ID	{Cptr = Cinstall($1->Name,$3->Name);}


                                <br>

                            </div>

                            Support for class extension will be introduced in the next stage.
                        </p>

                        <p>
                            The compiler must maintain a <b>class table</b> to store the information pertaining to each class
                            defined in an OExpL program. Class definitions appear after type definitions.
                            Each class definition specifies the member fields and methods of the class.
                            <i>Member field declarations must precede method declarations</i>.
                        </p>

                        <p>
                            For simplifying the implementation,<b> we will assume here that a class may contain at
                            most 8 member fields and at most 8 methods</b>. Hence, fixed size allocation of dynamic memory will be sufficient,
                            as in the 6th stage.
                        </p>
                        <p>
                            Each <b>class table entry stores information pertaining to a class</b>. The names and types of each member
                            field along with its <i>position index </i> must be stored in the class table
                            (In any class, the method defined first will be assigned position index as <b>0</b>, the method defined next will
                            be assigned position index as <b>1</b> and so on).
                             For each method, the <i>signature</i> of the method (method name, return type,
                            types and names or arguments) along with the binding (<i>label</i> of the method – a call to the method must be translated to a
                            call to this label) needs to be stored. The type field for a variable/method must contain a pointer to the corresponding type table entry.
                            <i>An exceptional case is when a member field is of a previously defined class.
                            In this case, a pointer to the class table entry of the member field must be maintained</i>.
                          </p>
                          <p>
                               <b>A thorough reading of the details of <a href="oexpl-data-structures.html" target="_blank">class table implementation</a> is necessary to
                               proceed further</b>.
                               In this stage, we will not support class extension. Hence, the parent class pointer entry for each class must be set to NULL.
                        </p>
                        <p>
                          <!--  The program structure makes it possible to compile each class separately into a separate object file.-->
                           For
                            now, we will focus on syntax and semantics. Code generation will be taken up subsequently.
                        </p>

                      <!--  <p>
                            Global variable declarations may contain declarations for class variables along with other variable declarations.
                            Hence, the <a href="data_structures/global-symbol-table.html" target="_blank">global symbol table</a> must be able to maintain
                            information about class variables.  <b>For class variables, a pointer to the class table entry of the variable's class
                            must be set in the the field <i>Ctype</i> of the <a href="data_structures/global-symbol-table.html" target="_blank">global symbol table entry</a></b>.

                            </p>
-->

                            <p>
                                When <b>a variable of a class</b> is declared (in the global declaration section),
                                <b>a pointer to the class table entry must be mainted in the <a href="data_structures/global-symbol-table.html" target="_blank">global symbol table entry</a></b> of the variable.
                                Hence, a new <i>class table pointer</i> field may be added to the global symbol table structure.
                                Note that the <a href="data_structures/global-symbol-table.html" target="_blank">global symbol table entry</a> for a global variable will have either a class table pointer entry or a type table pointer entry,
                                but not both.
                            </p>
                            The following scope rules must be carefully checked to ensure correct semantic analysis:


                         <ul>

                                <li id="otis">
                                1. Methods of a class variable are accessed as <i>self.method_name(..args..)</i>.
                                (If a member is of a previously class, <i>self.field_name.method_name(..args..)</i>).
                                </li>
                                <li id="otis">
                                2. <b>A member field in a class shall be accessed only within a method of the same class.</b>
                                The access syntax will be <i>self.field_name</i>. (<b>Note:</b> if <i>field_name</i> is variable of another
                                class (or the same class), accessing member fields of the member using syntax <i>self.field_name.sub_field.name</i> <b>is
                                not be permitted – why?</b>)
                                </li>
                                <li id="otis">
                                3. A method of one class is generally not permitted to access methods of other classes.
                                However, if a class contains a member field of another class, then the methods accessible
                                through the member field can be invoked as noted in point 1 above.

                                                              </li>
                                <li id="otis">
                                4. There is <b>exactly one method carrying a name in a class</b>. Thus, <a href="https://en.wikipedia.org/wiki/Function_overloading" target="_blank">function overloading</a> is not permitted.
                                </li>


                        </ul>

                        <!--This field is required only for the OEexpL.-->

                        <p>
                            These rules are not difficult to check once the class table is properly constructed from the class definitions.
                            <b> All variable and function declarations visible to a method are contained in the class table entry of the relevant class</b>.
                        </p>


                        <p>
                            <b>Task 1:</b> Complete the front end - lexical, syntax and semantic analysis for the extended language with classes.
                        </p>

                        <p>
                            <b>Part II: Code Generation </b>
                        </p>
                        <p>
                            Code generation is not hard, once the access semantics of <i>self</i> is understood.
                            <b>The basic observation is that the code for any method can be generated immediately as the definition of the method is processed</b>.
                             This is true because a method's access is limited to its member fields and previously declared methods of the class.
                             This information would be already entered into the class table entry for the class.
                        </p>
                        <p>
                            <b> When a method declaration is encountered, a unique label must be generated for the method and the label must be entered into the
                              class table entry of the method.</b> Thus, the call address to any method can be determined from the class table entry of the
                              corresponding class.  (In the next stage, we will see that a more sophisticated strategy will be needed to support inheritance).
                        </p>
                        <p>
                            Storage allocation for class variables is equally simple.  Just as was done for user defined types,
                            a call to <i>'new'</i> must allocate a block of 8 words in the heap for storing the member fields of the variable.
                            The start address of this block must be assigned to the variable.
                        </p>
                        <p>
                            Other matters being routine, the non-trivial point pending is the <b>deferencing of the self reference</b> – in <i>self.field_name</i>.
                        </p>
                        <p>
                             The following paragraphs summarize the key ideas:
                        </p>

                        <ul>

                            <li id="otis">

                            1.  How to get the address of <i>self?</i>  First observe that a reference to <i>self</i> will occur only within a member function
                            of a class variable.   Second, all variables of the same class share the code of all the methods of the class.
                            Thus, at run time, a method must be told which is the variable for which the current call is made.
                            In other words, <b>the address of self can be determined only at run-time
                              (<i>why? - ensure that you digest this point clearly before proceeding further!</i>)</b>
                            <br>

                            <p>
                            The standard way to resolve this reference is to set the convention that <b>before a call to a method,
                              the caller must push the address of <i>self</i></b> (for the particular call) <b>as an argument into the stack.</b>
                              The convention we suggest is to push the address of <i>self</i> before pushing the arguments during method invocations.
                              Note that <i>self</i> is an <b>implicit argument</b>, not found either in the declaration or the definition of the method.

                  In the next stage, we will need to push one more implicit argument for each method invocation.  It is instructive to have a quick look
                              at the <a href="oexpl-run-data-structures.html#nav-runtimestackmanagementformethodinvocations" target="_blank">run-time-stack management documentation</a> for OExpL at this stage
                            </p>
                            </li>
                            <br>
                            <li id="otis">
                            2.  The <b>local symbol table of a method must contain an entry for <i>self</i></b> along with the other arguments to the function.
                            The binding field of this entry must be the relative address (with respect to BP in the stack) where the address of self must
                            be pushed by the caller.  For example, if this value is -k, then the compiler expects that the caller would have pushed to
                            stack corresponding to  [BP-k] the address of the heap block holding the member fields of the variable.
                            </li>
                            </ul>
                        <p>
                            The task of completing code generation phase is now straightforward.
                        </p>
                        <p>
                            <b>Task 2:</b> Complete code generation for this stage.
                        </p>
                      <!--  <p>
                           <b> Part III:  Building Source Code in Multiple Files</b> (optional).
                        </p>
                        <p>
                            In this section, we will see how to support the ExpL programmar to compile each class separately and link them together with the main program.  Since the contents of this section are tangential to our principal concerns, this part is marked optional.  However support for modularity is <i>absolutely essential</i> if large programs have to be written and maintained.  Since the section is optional, the description here is brief.
                        </p>
                        <p>
                            To avoid too large code files, it would be advisable to code the source program into multiple files and compile each one separately.  The following structure is proposed:
                        <ul>
                             <li id="otis">
                            1. All type definitions may be collected together in the file <i>program_name.type</i> between the keywords <i>type</i> and <i>endtype.</i>  </li>
                            <li id="otis">
                            2.  Each class may be written into a separate class file (<i>class_name.class</i>).
                            </li>
                            <li id="otis">
                            3.  The main program (<i>progam_name.expl</i>) shall contain <b>no type definitions or class definitions</b>.   The main program contains global declarations, function definitions and the Main function.
                            </li>
                        </ul>
                        </p>
                        <p>
                            Your compiler must support two types of compilation options:
                        <ul>
                            <li id="otis">
                            1.  <b>expl -obj</b> <i>class_name.class program_name.type</i>
                            <br>
                            This option must compile a single class file into an object file.  The code generated will contain unresolved labels.   The output file may be named <i>class_name.obj</i>.  The structure for object files will be discussed soon.

                            </li>
                            <li id="otis">
                            2.  <b>expl</b> <i>program_name.expl, program_name.type, class_name1.obj, class_name2.obj,class_namek.obj</i>
                            <br>
                            This option expects that the object files are created using the <b>-obj</b> option.  The main program is build by linking together all the object modules and the ouput file must be an ABI compatible <i>program_name.xsm</i>
                            </li>
                        </ul>
                        </p>
                        <p>
                            The key observation here is that to <b>generate assembly code with labels for a particular class, one needs only the class definition and all the type defintions</b>. The main points to be careful here are:
                        <ul>
                            <li id="otis">

                            a.  As each class is compiled separately, <b>the labels in each object file must not be conflicting</b>.  To solve this issue, each label in an object file may be annotated by the corresponding class name (Ex.  <i>class_name1_L0</i>, <i>class_name1_L1</i> etc.)
                            </li>
                            <li id="otis">
                            b.  Since the class definitions must be made available in the final building step, during the compilation of a class, the <b>class definition must be copied into the object file</b>.
                            </li>
                        </ul>
                        </p>
                        <p>
                            Thus, each object file will contain a class definition followed by the assembly code (with labels) for the methods of the class.  Finally, during the final compilation process, class definitions from each object file will be combined into the class table and the final code generation and label translation steps will be done.
                        </p>
                        <p>
                            You must look at the YACC/Bison manual to find out how parsing of multiple files can be done.  We leave out the technical details.
                        </p>
                        <p>
                            <b>Task 3:</b>  Implement support for separate compilation of object files as outlined above.
                        </p>
                        <p>
                            <b>Note:</b>  The ABI given to you do not support object file formats.  Hence we have suggested an object format specific to the ExpL project.  However, standard binary interfaces support object file formats – (For example see the <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format" target="_blank"> Unix ELF)</a>.   These object files may be linked together by the operating system at load time – that is, when the program is loaded into the memory for execution.  A discussion of these matters is beyond the scope of this project.
                        </p> -->

                        <p>
                            <i>Exercise 1</i>:  What modifications must be done to allow class variables to be passed as arguments to functions?
                        </p>
                        <p>
                            <i>Exercise 2</i>:  What modifications must be done to allow functions to have locally declared variables of class?
                        </p>

                      <!--  <p>
                          Check your implementation with the following test cases : <br>
                          1) <a href="oexpltestprograms/test1.html" target="_blank">Test Case 1</a> <br>
                          2) <a href="oexpltestprograms/test2.html" target="_blank">Test Case 2</a> <br>
                          3) <a href="oexpltestprograms/test3.html" target="_blank">Test Case 3</a> <br>
                        </p> -->

                        <h4> <b>Test Programs </b></h4>
                        <p>
                          Check your implementation with the following test cases : <br>
                          <a href="oexpl-testprograms.html#test1" target="_blank">Test Program 1: Binary Search Tree using Classes
</a> <br>
                          <a href="oexpl-testprograms.html#test2" target="_blank">Test Program 2: Linked list in OExpL
</a> <br>
                          <a href="oexpl-testprograms.html#test3" target="_blank">Test Program 3: Sum of factorials</a> <br>

                        </p>


                        </article>

                         <article class="grid col-full" id="nav-stage8">
                         <h2>Stage 8:  Inheritance and Sub-type Polymorphism</h2>
                            <b>Time Estimate :</b> 1 week, 5-10 hours/week
                        <p>
                            <b>Prerequisite Reading</b>:
                            <ul>
                                <li id="otis">1.  A rigorous reading of the <a href="oexpl-specification.html" target="_blank"> OExpL specification </a></li>

                            </ul>
                        </p>

                        <p>
                            <b>Learning Objectives</b>:
                            <ul>
                                 <li id="otis">You will extend the language of Stage 7 by adding support for single inheritance
                                   and subtype polymorphism to make OExpL an <i>object oriented</i> language.</li>
                            </ul>
                            <hr>
                        </p>

                        <p>

                        In this stage, we will extend ExpL to provide support for
                        <a href="https://en.wikipedia.org/wiki/Inheritance_(object-oriented_programming)" target="_blank">single inheritance</a> and
                        <a href="https://en.wikipedia.org/wiki/Subtyping" target="_blank">subtype polymorphism</a>.
                        These features qualifies OExpL to be called an object oriented language.

                        </p>
                        <p>

                        Addition of support for inheritance and polymorphism involves some intellectual complexity.
                        Fortunately, once the underlying conceptual issues are understood, the implementation is not difficult.

                        </p>

                        <p>
                        <b>Part I:  Syntax and Semantics</b>
                        </p>


                        <p>
                            The definition of a class could now be by extention of a previously defined class.
                            <div class="syntax">
                              Cname         : ID       	{Cptr = Cinstall($1->Name,NULL);} <br>
            &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;| ID Extends ID	{Cptr = Cinstall($1->Name,$3->Name);}
                            </div>
                        </p>

                        <p>
                        An additional field called the <b>parent class pointer</b> must be added to each class table entry.
                        If the class is defined by extention of another class, the parent class pointer entry must point to the
                        class table entry of the parent class.  If the class is not an extention of any other class, this entry must be NULL.  (<a href="oexpl-data-structures.html" target="_blank">See class table structure</a>).
                        An outline of the grammar is given <a href="oexpl-grammar-outline.html" target="_blank">here</a>.
                        </p>


                        <p>
                            An extended class inherits all the member fields and methods of the parent class.
                            The descendent class is permitted to do additional definitions as specified below.

                        <ul>
                            <li id="otis">
                            1.  The only member field declarations in the descendent class shall be for the  new member
                            fields that are not present in the parent.  (<b>The descendent cannot re-declare or remove any of
                            the parent's member fields.</b>)  Additionally, in our current implementation plan,
                            the number of member fields - including those inherited from the parent - must not exceed 8
                            (because our memory allocation policy allocates only 8 words for storing a class object).
                            <br>
                            From an implementation perspective, <b>whenever the definition of a class by extension of
                              another class is encountered, the compiler must copy all the member field information of the
                              parent to the descendent class.</b>  The member fields may be listed in the same order as they
                              appear in the parent.  Additional entries must be created for new members specific to the
                              descendent class.
                            </li><br>
                            <li id="otis">
                            2.   The descendent class may <b>define new methods</b> or <b><a href="https://en.wikipedia.org/wiki/Method_overriding" target="_blank">override</a></b> one or more
                            of the <b>parent's methods</b>.  OExpL allows a class to contain only one definition for a function.
                             <b>Once a method is over-ridden, the method of the parent class will no longer be visible inside the
                             descendent class</b>.  Moreover, the <b>signature of the overridden method must be identical with the
                               signature of the method in the parent class </b>(in both the number and the types of arguments).
                               In  other words, <a href="https://en.wikipedia.org/wiki/Function_overloading" target="_blank">function overloading</a>
                                is not permitted. (Exercise 1 at the end asks you to relax this condition).

                            </li>
                        </ul>
                        </p>

                        <p>
                            From an implementation perspective, the compiler may <b>initially copy the signatures of all methods
                              of the parent to the class table entry of the descendent</b>. New entries will have to be created for methods
                              that defined in the descendent, but not present in the parent.  Finally, the present implementation restricts
                              the number of methods in a class ( including inherited methods ) to 8.
                        </p>

                        <p>
                            What about the labels of the methods of the descendent class? <b> If a method is over-ridden or defined new in a class,
                            the compiler must assign a new label for the method and store this label in the <i>Flabel</i> field of the method in the <a href="oexpl-data-structures.html" target="_blank">class table entry</a>.
                            Otherwise, the class inherits the parent's method and the label of the method in the parent class ( specified by the <i>Flabel</i> field of the parent class ) must be copied
                            into the <i>Flabel</i> field of the method in the descendent class</b> ( See Memberfunclist in the class table ).
                        </p>


                        <p>
                            Once the class table entries are created as described above, syntax and semantic analysis phases of
                            compilation can be completed easily.
                        </p>


                        <p>
                            An assignment of the form a = b; between class variables is valid if either <i>a</i> and <i>b</i> are variables of the
                            same class or <i>b</i> is a variable of a descendent class of the class <i>a</i>.  Similarly a=new(class_name); is
                            valid if the class identified by class_name is a descendent class of the class to which <i>a</i> belongs to.
                            Thus, <b>a variable of a certain class can hold a reference to any object of the same or descendent classes</b>.
                            Since the class table contains adequate information to perform this check, syntax and sematic analysis can be completed now.

                        </p>


                        <p>
                            <b>Task 1:</b>  Complete the front end - lexical, syntax and Semantic analysis.
                        </p>


                    <!--    <p>

                        <b>Note:</b>  If classes are defined in separate files, the parent class and all the desecndent classes in an
                        inheritance hierarchy must be defined in a single file.   Thus, each object file will contain the complete class table
                        information pertaining to a class hierarchy.

                      </p>-->


                        <p>

                           <b>Part II:  Code Generation</b>
                        </p>


                        <p>
                        <ul>


                        The key differences between the language of Stage 7 and the present stage are the following:

                        <li id="otis">
                            a).  A variable of a parent class may hold a reference to an object of any descendent class.
                        </li>

                        <li id="otis">
                            b).  The class to which the referred object belongs to at run time cannot be determined at compile time (why?)
                        </li>

                        <li id="otis">
                            c).  When a method is invoked using the variable of the parent class, if the method is over-ridden by the child class,
                            then the method of the child class must be invoked.
                        </li>
                        </ul>
                        </p>
                        <p>

                        Assume that A, B, C are classes such that B extends A and C extends B.   Let x be a variable of class A.
                        Consider the program segment:

                       </p>


<!-- CHECK THIS ONCE -->

                       <p>

                            <div class="syntax">
                                ...
                                <br>
                                Read(n);
                                <br>
                                if (n>0) then
                                <br>
                                &nbsp;&nbsp;x = new(B); <br>
                                else  <br>
                                &nbsp;&nbsp;x = new(C); <br>
                                endif <br>
                                &nbsp;&nbsp;retval = x.fun(); <br>
                                ... <br>
                            </div>
                        </p>
                        <p>
                        Let <i>fun()</i> be a function defined in class A.
                        Let us assume that the label for the function in the class table is L1.
                         Suppose that B does not over-ride <i>fun()</i>, but C over-rides <i>fun()</i>.
                          Let the label for the over-ridden function be L2.
                        </p>
                        <p><b>Important Note :</b> Unless fun() is defined in class A, the call <i>x.fun()</i> should result in a compilation error even if B and C contains the declaration for <i>fun()</i>. This is because
                          if a variable of a parent class holds the reference to an object of a descendent class, only
                           methods defined in the parent class are allowed to be invoked using the variable of the parent class.</p>
                        <p>

                            The call <i>x.fun()</i>;   must translate to CALL L1 if (n>0) whereas <i>x.fun()</i> must translate to CALL L2 otherwise.
                             The value of n is known only at run-time and the compiler can't hope to guess it.   What will the compiler do now?!

                        </p>

                        <p>
                            It turns out that there is a remarkably simple and elegant way to handle the issue by maintaining what are known
                             as  <b>function tables</b> (called <a href="https://en.wikipedia.org/wiki/Virtual_method_table" target="_blank">virtual function tables</a> in OOP jargon) at run time.

                        </p>

                        <p>

                            <b>The key to the solution to the problem is that the variable x must carry the information about which
                              class is it referring to at run time, and this information must be used to dereference the function call at run time. </b>
                        </p>

                        <p>
                        <ul>

                            The implementation details are outlined below: <br>

                        <li id="otis">
                          <br>
                            1. <b> For each class, we will maintain a table of size 8 in the memory, listing labels of the
                              methods of the class. </b> (Recall that we permit a class to have at most 8 methods).
                              <b>The labels may be listed in the order in which the methods are defined in the class.</b>
                              This table is called the <b>virtual function table</b> for the class.
                              Since all the class definitions are known at compile time, the compiler can generate initial code to set
                              up all the class tables in the stack region of the program.
                              (Typically, at the beginning of the stack, allocating space before global variables.)

                        </li>
                          <br>

                        <li id="otis">
                            2.  <b>The label for a method in the virtual function table will be set to the method's label in the corresponding
                              class table entry.</b>    Consequently, the label of a method in a descendent class will be the
                              label of the method in the parent class if the method is not over-ridden.  Referring to the
                              example above, the label for the method fun() in the virtual function tables of classes A and B will be L1, whereas, the
                              label will be L2 in the virtual function table of class C.

                        </li>
                          <br>
                        <li id="otis">
                        3.  <b>The position of the label of a method in the virtual function tables of the parent class and all the descendent
                          classes in a class hierarchy must be the same.</b>  Referring to the above example,  if the label of the method fun()
                          is listed third in the virtual function table of A, then it must be listed third in the virtual function tables of classes B and C.
                          Thus the index of a method's entry in the virtual function table will be the same for every class in a class hierarchy.
                          The advantage of this convention is that while translating, <b>the position of a method's label relative to the base
                            address of the virtual function table is completely determined at compile time.</b> Note that since the language does not
                           permit function overloading, the entry for a given method name in a virtual function table will be unique.
                        </li>
                          <br>

                        <li id="otis">
                            4.  In view of the above, to translate <i>x.fun()</i>, all that needs to be determined at run time is the
                            base address of the correct virtual function table.  To keep track of this information,<b> the compiler
                              must allocate two memory locations for each class variable -  One for the usual pointer to the
                              memory block allocated in the heap for storing the object's member fields .  The second, to store
                              a pointer to the virtual function table of the class to which the current object stored in the variable belongs to.</b>
                        </li>


                        </ul>
                        </p>

                        <p>

                            Conceptually, each class variable holds a pair <i>[MemberFieldPointer, VirtualFunctionTablePointer]</i>.  The statement <i>x=new(B)</i>;  in the
                            example above should do the following:
                            <ul>
                             <li>  Allocate a block of 8 words in the heap and store the start address
                            in the <i>MemberFieldPointer</i> field of x. <br> </li>
                              <li>  set the <i>VirtualFunctionTablePointer</i> field of x to the base address of the virtual function table of class B.</li>
                            </ul>
                            An assignment of the form y=x, if valid semantically, will result in both the pointers of x being copied
                            into the corresponding pointers of y.
                        </p>

                        <p>
                            Note that once the base of the correct virtual function table is known, invoking the right function simply involves
                            adding the offset of the function to the base and making a call to the address (label) stored in the virtual function table entry.

                        </p>

                        <p>
                            Note that the labels will be automatically translated to actual addresses during the <a href="label-translation.html" target="_blank">label translation phase</a>.

                        </p>

                        <p>
                            To implement virtual function tables on the eXpOS ABI, the suggested method is to allocate space (8 words each)
                            for storing the virtual function tables of each class in the stack before allocating space for global variables.


                        </p>

                        <!--
                        <p>
                            A detailed tutorial on the implementation of function tables is given <a href="#" target="_blank">here [LINK]</a>.
                        </p>
                      -->

                        <p>

                            <b>Read the <a href="oexpl-run-data-structures.html" target="_blank">OExpL run time data structures documentation</a>
                            before proceeding further.</b>
                          </p>


                        <p>
                            <b>Task 2:</b>  Complete the OExpL compiler.
                        </p>


                          <!--  You may test your compiler implementation with the test programs given <a href="oexpl-testprograms.html" target="_blank">here</a> -->

                      <!--  <p>
                          Check your implementation with the following test cases : <br>
                          1) <a href="oexpltestprograms/test4.html" target="_blank">Test Case 1</a> <br>
                          2) <a href="oexpltestprograms/test5.html" target="_blank">Test Case 2</a> <br>
                          3) <a href="oexpltestprograms/test6.html" target="_blank">Test Case 3</a> <br>
                          4) <a href="oexpltestprograms/test7.html" target="_blank">Test Case 4</a> <br>
                        </p>
                      -->

                        <h4> <b>Test Programs </b></h4>
                        <p>
                          Check your implementation with the following test cases : <br>
                          <a href="oexpl-testprograms.html#test4" target="_blank">Test Program 1: Testing the runtime binding of the variables of a class
</a> <br>
                          <a href="oexpl-testprograms.html#test5" target="_blank">Test Program 2: Testing the correct set up of the virtual function table
</a> <br>
                          <a href="oexpl-testprograms.html#test6" target="_blank">Test Program 3: Testing the implementation of inheritance and subtype polymorphism</a> <br>

                          <a href="oexpl-testprograms.html#test7" target="_blank">Test Program 4: Testing the implementation of inheritance and subtype polymorphism</a> <br>


                        </p>


                        <p>
                        <i>Exercise 1</i>:  This exercise asks you to add a limited form of <b>function overloading</b> support to the language.
                        Here, when a descendent class overrides a method of a parent class, it can re-define the function with a signature that is
                        possibly different from that of the parent class.  In this case, both the definitions will be active in the child class; with
                        the compiler translating the call to the correct address (label) looking at the arguments.
                        Make necessary modifications to the language syntax to support this form of function overloading.
                        </p>

                        <p>
                            Note:  Overloading and subtype polymorphism are two polymorphism types typiclly supported by most object
                            oriented langauges.
                          A third important type of polymorphism
                          called <i><a href="https://en.wikipedia.org/wiki/Parametric_polymorphism" target="_blank">parametric polymorphism</a></i> (templates in C++)
                          has not been touched upon in this project.


                        </p>

                        </p>
                        </article>


                </div>
            </section>
            </body>
        </div>


        <footer class="center part clearfix">
            <ul class="social column3 fleft">
                <li><a href="https://github.com/silcnitc">Github</a></li>
                <li>  <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">
                    <img alt="Creative Commons License" style="border-width:0" src="img/creativecommons.png" /></a></li>
            </ul>
          <div class="up column3 mright"> <a href="#navtop" class="ir">Go up</a> </div>
          <nav class="column3">
            <ul>
              <li><a href="index.html">Home</a></li>
              <li><a href="about.html">About</a></li>
              <!-- <li><a href="uc.html">Contact</a></li> -->
            </ul>
          </nav>
      </footer>
    <!-- Javascript - jQuery
    <script src="http://code.jquery.com/jquery.min.js"></script>-->
    <script>window.jQuery || document.write('<script src="js/jquery-1.7.2.min.js"><\/script>')</script>
    <!--[if (gte IE 6)&(lte IE 8)]>
    <script src="js/selectivizr.js"></script>
    <![endif]-->
    <script src="js/scripts.js"></script>
    <script src="js/inject.js"></script>
</html>