Skip to content

Commit

Permalink
many edits, patch by Kelly Wilson!
Browse files Browse the repository at this point in the history
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@44157 91177308-0d34-0410-b5e6-96231b3b80d8
  • Loading branch information
lattner committed Nov 15, 2007
1 parent 5b8318a commit b7e6b1a
Show file tree
Hide file tree
Showing 3 changed files with 74 additions and 74 deletions.
62 changes: 31 additions & 31 deletions docs/tutorial/LangImpl6.html
Original file line number Diff line number Diff line change
Expand Up @@ -41,15 +41,16 @@

<p>Welcome to Chapter 6 of the "<a href="index.html">Implementing a language
with LLVM</a>" tutorial. At this point in our tutorial, we now have a fully
functional language that is fairly minimal, but also useful. One big problem
with it though is that it doesn't have many useful operators (like division,
logical negation, or even any comparisons other than less-than.</p>
functional language that is fairly minimal, but also useful. There
is still one big problem with it, however. Our language doesn't have many
useful operators (like division, logical negation, or even any comparisons
besides less-than).</p>

<p>This chapter of the tutorial takes a wild digression into adding user-defined
operators to the simple and beautiful Kaleidoscope language, giving us a
simple and ugly language in some ways, but also a powerful one at the same time.
operators to the simple and beautiful Kaleidoscope language. This digression now gives
us a simple and ugly language in some ways, but also a powerful one at the same time.
One of the great things about creating your own language is that you get to
decide what is good or bad. In this tutorial we'll assume that it is okay and
decide what is good or bad. In this tutorial we'll assume that it is okay to
use this as a way to show some interesting parsing techniques.</p>

<p>At the end of this tutorial, we'll run through an example Kaleidoscope
Expand All @@ -73,8 +74,8 @@
operators that are supported.</p>

<p>The point of going into user-defined operators in a tutorial like this is to
show the power and flexibility of using a hand-written parser. The parser we
are using so far is using recursive descent for most parts of the grammar, and
show the power and flexibility of using a hand-written parser. Thus far, the parser
we have been implementing uses recursive descent for most parts of the grammar and
operator precedence parsing for the expressions. See <a
href="LangImpl2.html">Chapter 2</a> for details. Without using operator
precedence parsing, it would be very difficult to allow the programmer to
Expand Down Expand Up @@ -152,12 +153,12 @@

<p>This just adds lexer support for the unary and binary keywords, like we
did in <a href="LangImpl5.html#iflexer">previous chapters</a>. One nice thing
about our current AST is that we represent binary operators fully generally
with their ASCII code as the opcode. For our extended operators, we'll use the
about our current AST, is that we represent binary operators with full generalisation
by using their ASCII code as the opcode. For our extended operators, we'll use this
same representation, so we don't need any new AST or parser support.</p>

<p>On the other hand, we have to be able to represent the definitions of these
new operators, in the "def binary| 5" part of the function definition. In the
new operators, in the "def binary| 5" part of the function definition. In our
grammar so far, the "name" for the function definition is parsed as the
"prototype" production and into the <tt>PrototypeAST</tt> AST node. To
represent our new user-defined operators as prototypes, we have to extend
Expand Down Expand Up @@ -257,14 +258,14 @@
</pre>
</div>

<p>This is all fairly straight-forward parsing code, and we have already seen
a lot of similar code in the past. One interesting piece of this is the part
that sets up <tt>FnName</tt> for binary operators. This builds names like
"binary@" for a newly defined "@" operator. This takes advantage of the fact
that symbol names in the LLVM symbol table are allowed to have any character in
them, even including embedded nul characters.</p>
<p>This is all fairly straightforward parsing code, and we have already seen
a lot of similar code in the past. One interesting part about the code above is
the couple lines that set up <tt>FnName</tt> for binary operators. This builds names
like "binary@" for a newly defined "@" operator. This then takes advantage of the
fact that symbol names in the LLVM symbol table are allowed to have any character in
them, including embedded nul characters.</p>

<p>The next interesting piece is codegen support for these binary operators.
<p>The next interesting thing to add, is codegen support for these binary operators.
Given our current structure, this is a simple addition of a default case for our
existing binary operator node:</p>

Expand Down Expand Up @@ -301,10 +302,10 @@
<p>As you can see above, the new code is actually really simple. It just does
a lookup for the appropriate operator in the symbol table and generates a
function call to it. Since user-defined operators are just built as normal
functions (because the "prototype" boils down into a function with the right
functions (because the "prototype" boils down to a function with the right
name) everything falls into place.</p>

<p>The final missing piece is a bit of top level magic, here:</p>
<p>The final piece of code we are missing, is a bit of top level magic:</p>

<div class="doc_code">
<pre>
Expand All @@ -330,10 +331,9 @@

<p>Basically, before codegening a function, if it is a user-defined operator, we
register it in the precedence table. This allows the binary operator parsing
logic we already have to handle it. Since it is a fully-general operator
precedence parser, this is all we need to do to "extend the grammar".</p>
logic we already have in place to handle it. Since we are working on a fully-general operator precedence parser, this is all we need to do to "extend the grammar".</p>

<p>With that, we have useful user-defined binary operators. This builds a lot
<p>Now we have useful user-defined binary operators. This builds a lot
on the previous framework we built for other operators. Adding unary operators
is a bit more challenging, because we don't have any framework for it yet - lets
see what it takes.</p>
Expand All @@ -347,7 +347,7 @@
<div class="doc_text">

<p>Since we don't currently support unary operators in the Kaleidoscope
language, we'll need to add everything for them. Above, we added simple
language, we'll need to add everything to support them. Above, we added simple
support for the 'unary' keyword to the lexer. In addition to that, we need an
AST node:</p>

Expand Down Expand Up @@ -390,14 +390,14 @@
</pre>
</div>

<p>The grammar we add is pretty straight-forward here. If we see a unary
<p>The grammar we add is pretty straightforward here. If we see a unary
operator when parsing a primary operator, we eat the operator as a prefix and
parse the remaining piece as another unary operator. This allows us to handle
multiple unary operators (e.g. "!!x"). Note that unary operators can't have
ambiguous parses like binary operators can, so there is no need for precedence
information.</p>

<p>The problem with the above is that we need to call ParseUnary from somewhere.
<p>The problem with this function, is that we need to call ParseUnary from somewhere.
To do this, we change previous callers of ParsePrimary to call ParseUnary
instead:</p>

Expand All @@ -424,7 +424,7 @@
</pre>
</div>

<p>With these two simple changes, we now parse unary operators and build the
<p>With these two simple changes, we are now able to parse unary operators and build the
AST for them. Next up, we need to add parser support for prototypes, to parse
the unary operator prototype. We extend the binary operator code above
with:</p>
Expand Down Expand Up @@ -587,7 +587,7 @@

<p>Based on these simple primitive operations, we can start to define more
interesting things. For example, here's a little function that solves for the
number of iterations it takes for a function in the complex plane to
number of iterations it takes a function in the complex plane to
converge:</p>

<div class="doc_code">
Expand Down Expand Up @@ -779,8 +779,8 @@
plot things that are!</p>

<p>With this, we conclude the "adding user-defined operators" chapter of the
tutorial. We successfully extended our language with the ability to extend the
language in the library, and showed how this can be used to build a simple but
tutorial. We have successfully augmented our language, adding the ability to extend the
language in the library, and we have shown how this can be used to build a simple but
interesting end-user application in Kaleidoscope. At this point, Kaleidoscope
can build a variety of applications that are functional and can call functions
with side-effects, but it can't actually define and mutate a variable itself.
Expand All @@ -790,7 +790,7 @@
languages, and it is not at all obvious how to <a href="LangImpl7.html">add
support for mutable variables</a> without having to add an "SSA construction"
phase to your front-end. In the next chapter, we will describe how you can
add this without building SSA in your front-end.</p>
add variable mutation without building SSA in your front-end.</p>

</div>

Expand Down
50 changes: 25 additions & 25 deletions docs/tutorial/LangImpl7.html
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@
href="http://en.wikipedia.org/wiki/Functional_programming">functional
programming language</a>. In our journey, we learned some parsing techniques,
how to build and represent an AST, how to build LLVM IR, and how to optimize
the resultant code and JIT compile it.</p>
the resultant code as well as JIT compile it.</p>

<p>While Kaleidoscope is interesting as a functional language, this makes it
"too easy" to generate LLVM IR for it. In particular, a functional language
makes it very easy to build LLVM IR directly in <a
<p>While Kaleidoscope is interesting as a functional language, the fact that it
is functional makes it "too easy" to generate LLVM IR for it. In particular, a
functional language makes it very easy to build LLVM IR directly in <a
href="http://en.wikipedia.org/wiki/Static_single_assignment_form">SSA form</a>.
Since LLVM requires that the input code be in SSA form, this is a very nice
property and it is often unclear to newcomers how to generate code for an
Expand Down Expand Up @@ -124,13 +124,13 @@
(cond_true/cond_false). In order to merge the incoming values, the X.2 phi node
in the cond_next block selects the right value to use based on where control
flow is coming from: if control flow comes from the cond_false block, X.2 gets
the value of X.1. Alternatively, if control flow comes from cond_tree, it gets
the value of X.1. Alternatively, if control flow comes from cond_true, it gets
the value of X.0. The intent of this chapter is not to explain the details of
SSA form. For more information, see one of the many <a
href="http://en.wikipedia.org/wiki/Static_single_assignment_form">online
references</a>.</p>

<p>The question for this article is "who places phi nodes when lowering
<p>The question for this article is "who places the phi nodes when lowering
assignments to mutable variables?". The issue here is that LLVM
<em>requires</em> that its IR be in SSA form: there is no "non-ssa" mode for it.
However, SSA construction requires non-trivial algorithms and data structures,
Expand Down Expand Up @@ -162,12 +162,12 @@
</p>

<p>In LLVM, all memory accesses are explicit with load/store instructions, and
it is carefully designed to not have (or need) an "address-of" operator. Notice
it is carefully designed not to have (or need) an "address-of" operator. Notice
how the type of the @G/@H global variables is actually "i32*" even though the
variable is defined as "i32". What this means is that @G defines <em>space</em>
for an i32 in the global data area, but its <em>name</em> actually refers to the
address for that space. Stack variables work the same way, but instead of being
declared with global variable definitions, they are declared with the
address for that space. Stack variables work the same way, except that instead of
being declared with global variable definitions, they are declared with the
<a href="../LangRef.html#i_alloca">LLVM alloca instruction</a>:</p>

<div class="doc_code">
Expand Down Expand Up @@ -259,10 +259,10 @@
</pre>
</div>

<p>The mem2reg pass implements the standard "iterated dominator frontier"
<p>The mem2reg pass implements the standard "iterated dominance frontier"
algorithm for constructing SSA form and has a number of optimizations that speed
up (very common) degenerate cases. mem2reg is the answer for dealing with
mutable variables, and we highly recommend that you depend on it. Note that
up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing
with mutable variables, and we highly recommend that you depend on it. Note that
mem2reg only works on variables in certain circumstances:</p>

<ol>
Expand All @@ -288,10 +288,10 @@

<p>
All of these properties are easy to satisfy for most imperative languages, and
we'll illustrate this below with Kaleidoscope. The final question you may be
we'll illustrate it below with Kaleidoscope. The final question you may be
asking is: should I bother with this nonsense for my front-end? Wouldn't it be
better if I just did SSA construction directly, avoiding use of the mem2reg
optimization pass? In short, we strongly recommend that use you this technique
optimization pass? In short, we strongly recommend that you use this technique
for building SSA form, unless there is an extremely good reason not to. Using
this technique is:</p>

Expand All @@ -309,8 +309,8 @@

<li>Needed for debug info generation: <a href="../SourceLevelDebugging.html">
Debug information in LLVM</a> relies on having the address of the variable
exposed to attach debug info to it. This technique dovetails very naturally
with this style of debug info.</li>
exposed so that debug info can be attached to it. This technique dovetails
very naturally with this style of debug info.</li>
</ul>

<p>If nothing else, this makes it much easier to get your front-end up and
Expand All @@ -337,7 +337,7 @@
</ol>

<p>While the first item is really what this is about, we only have variables
for incoming arguments and for induction variables, and redefining those only
for incoming arguments as well as for induction variables, and redefining those only
goes so far :). Also, the ability to define new variables is a
useful thing regardless of whether you will be mutating them. Here's a
motivating example that shows how we could use these:</p>
Expand Down Expand Up @@ -403,8 +403,8 @@
</p>

<p>To start our transformation of Kaleidoscope, we'll change the NamedValues
map to map to AllocaInst* instead of Value*. Once we do this, the C++ compiler
will tell use what parts of the code we need to update:</p>
map so that it maps to AllocaInst* instead of Value*. Once we do this, the C++
compiler will tell us what parts of the code we need to update:</p>

<div class="doc_code">
<pre>
Expand Down Expand Up @@ -452,7 +452,7 @@
</pre>
</div>

<p>As you can see, this is pretty straight-forward. Next we need to update the
<p>As you can see, this is pretty straightforward. Now we need to update the
things that define the variables to set up the alloca. We'll start with
<tt>ForExprAST::Codegen</tt> (see the <a href="#code">full code listing</a> for
the unabridged code):</p>
Expand Down Expand Up @@ -518,7 +518,7 @@
argument. This method gets invoked by <tt>FunctionAST::Codegen</tt> right after
it sets up the entry block for the function.</p>

<p>The final missing piece is adding the 'mem2reg' pass, which allows us to get
<p>The final missing piece is adding the mem2reg pass, which allows us to get
good codegen once again:</p>

<div class="doc_code">
Expand All @@ -537,7 +537,7 @@

<p>It is interesting to see what the code looks like before and after the
mem2reg optimization runs. For example, this is the before/after code for our
recursive fib. Before the optimization:</p>
recursive fib function. Before the optimization:</p>

<div class="doc_code">
<pre>
Expand Down Expand Up @@ -709,7 +709,7 @@
</pre>
</div>

<p>Once it has the variable, codegen'ing the assignment is straight-forward:
<p>Once we have the variable, codegen'ing the assignment is straightforward:
we emit the RHS of the assignment, create a store, and return the computed
value. Returning a value allows for chained assignments like "X = (Y = Z)".</p>

Expand Down Expand Up @@ -799,7 +799,7 @@
the VarNames vector. Also, var/in has a body, this body is allowed to access
the variables defined by the var/in.</p>

<p>With this ready, we can define the parser pieces. First thing we do is add
<p>With this in place, we can define the parser pieces. The first thing we do is add
it as a primary expression:</p>

<div class="doc_code">
Expand Down Expand Up @@ -972,7 +972,7 @@
<p>With this, we completed what we set out to do. Our nice iterative fib
example from the intro compiles and runs just fine. The mem2reg pass optimizes
all of our stack variables into SSA registers, inserting PHI nodes where needed,
and our front-end remains simple: no iterated dominator frontier computation
and our front-end remains simple: no "iterated dominance frontier" computation
anywhere in sight.</p>

</div>
Expand Down
Loading

0 comments on commit b7e6b1a

Please sign in to comment.