diff --git a/docs/tutorial/LangImpl6.html b/docs/tutorial/LangImpl6.html index b6420c990a1..40334b031a5 100644 --- a/docs/tutorial/LangImpl6.html +++ b/docs/tutorial/LangImpl6.html @@ -41,15 +41,16 @@

Welcome to Chapter 6 of the "Implementing a language with LLVM" tutorial. At this point in our tutorial, we now have a fully -functional language that is fairly minimal, but also useful. One big problem -with it though is that it doesn't have many useful operators (like division, -logical negation, or even any comparisons other than less-than.

+functional language that is fairly minimal, but also useful. There +is still one big problem with it, however. Our language doesn't have many +useful operators (like division, logical negation, or even any comparisons +besides less-than).

This chapter of the tutorial takes a wild digression into adding user-defined -operators to the simple and beautiful Kaleidoscope language, giving us a -simple and ugly language in some ways, but also a powerful one at the same time. +operators to the simple and beautiful Kaleidoscope language. This digression now gives +us a simple and ugly language in some ways, but also a powerful one at the same time. One of the great things about creating your own language is that you get to -decide what is good or bad. In this tutorial we'll assume that it is okay and +decide what is good or bad. In this tutorial we'll assume that it is okay to use this as a way to show some interesting parsing techniques.

At the end of this tutorial, we'll run through an example Kaleidoscope @@ -73,8 +74,8 @@ operators that are supported.

The point of going into user-defined operators in a tutorial like this is to -show the power and flexibility of using a hand-written parser. The parser we -are using so far is using recursive descent for most parts of the grammar, and +show the power and flexibility of using a hand-written parser. Thus far, the parser +we have been implementing uses recursive descent for most parts of the grammar and operator precedence parsing for the expressions. See Chapter 2 for details. Without using operator precedence parsing, it would be very difficult to allow the programmer to @@ -152,12 +153,12 @@

This just adds lexer support for the unary and binary keywords, like we did in previous chapters. One nice thing -about our current AST is that we represent binary operators fully generally -with their ASCII code as the opcode. For our extended operators, we'll use the +about our current AST, is that we represent binary operators with full generalisation +by using their ASCII code as the opcode. For our extended operators, we'll use this same representation, so we don't need any new AST or parser support.

On the other hand, we have to be able to represent the definitions of these -new operators, in the "def binary| 5" part of the function definition. In the +new operators, in the "def binary| 5" part of the function definition. In our grammar so far, the "name" for the function definition is parsed as the "prototype" production and into the PrototypeAST AST node. To represent our new user-defined operators as prototypes, we have to extend @@ -257,14 +258,14 @@ -

This is all fairly straight-forward parsing code, and we have already seen -a lot of similar code in the past. One interesting piece of this is the part -that sets up FnName for binary operators. This builds names like -"binary@" for a newly defined "@" operator. This takes advantage of the fact -that symbol names in the LLVM symbol table are allowed to have any character in -them, even including embedded nul characters.

+

This is all fairly straightforward parsing code, and we have already seen +a lot of similar code in the past. One interesting part about the code above is +the couple lines that set up FnName for binary operators. This builds names +like "binary@" for a newly defined "@" operator. This then takes advantage of the +fact that symbol names in the LLVM symbol table are allowed to have any character in +them, including embedded nul characters.

-

The next interesting piece is codegen support for these binary operators. +

The next interesting thing to add, is codegen support for these binary operators. Given our current structure, this is a simple addition of a default case for our existing binary operator node:

@@ -301,10 +302,10 @@

As you can see above, the new code is actually really simple. It just does a lookup for the appropriate operator in the symbol table and generates a function call to it. Since user-defined operators are just built as normal -functions (because the "prototype" boils down into a function with the right +functions (because the "prototype" boils down to a function with the right name) everything falls into place.

-

The final missing piece is a bit of top level magic, here:

+

The final piece of code we are missing, is a bit of top level magic:

@@ -330,10 +331,9 @@
 
 

Basically, before codegening a function, if it is a user-defined operator, we register it in the precedence table. This allows the binary operator parsing -logic we already have to handle it. Since it is a fully-general operator -precedence parser, this is all we need to do to "extend the grammar".

+logic we already have in place to handle it. Since we are working on a fully-general operator precedence parser, this is all we need to do to "extend the grammar".

-

With that, we have useful user-defined binary operators. This builds a lot +

Now we have useful user-defined binary operators. This builds a lot on the previous framework we built for other operators. Adding unary operators is a bit more challenging, because we don't have any framework for it yet - lets see what it takes.

@@ -347,7 +347,7 @@

Since we don't currently support unary operators in the Kaleidoscope -language, we'll need to add everything for them. Above, we added simple +language, we'll need to add everything to support them. Above, we added simple support for the 'unary' keyword to the lexer. In addition to that, we need an AST node:

@@ -390,14 +390,14 @@
-

The grammar we add is pretty straight-forward here. If we see a unary +

The grammar we add is pretty straightforward here. If we see a unary operator when parsing a primary operator, we eat the operator as a prefix and parse the remaining piece as another unary operator. This allows us to handle multiple unary operators (e.g. "!!x"). Note that unary operators can't have ambiguous parses like binary operators can, so there is no need for precedence information.

-

The problem with the above is that we need to call ParseUnary from somewhere. +

The problem with this function, is that we need to call ParseUnary from somewhere. To do this, we change previous callers of ParsePrimary to call ParseUnary instead:

@@ -424,7 +424,7 @@ -

With these two simple changes, we now parse unary operators and build the +

With these two simple changes, we are now able to parse unary operators and build the AST for them. Next up, we need to add parser support for prototypes, to parse the unary operator prototype. We extend the binary operator code above with:

@@ -587,7 +587,7 @@

Based on these simple primitive operations, we can start to define more interesting things. For example, here's a little function that solves for the -number of iterations it takes for a function in the complex plane to +number of iterations it takes a function in the complex plane to converge:

@@ -779,8 +779,8 @@ plot things that are!

With this, we conclude the "adding user-defined operators" chapter of the -tutorial. We successfully extended our language with the ability to extend the -language in the library, and showed how this can be used to build a simple but +tutorial. We have successfully augmented our language, adding the ability to extend the +language in the library, and we have shown how this can be used to build a simple but interesting end-user application in Kaleidoscope. At this point, Kaleidoscope can build a variety of applications that are functional and can call functions with side-effects, but it can't actually define and mutate a variable itself. @@ -790,7 +790,7 @@ languages, and it is not at all obvious how to add support for mutable variables without having to add an "SSA construction" phase to your front-end. In the next chapter, we will describe how you can -add this without building SSA in your front-end.

+add variable mutation without building SSA in your front-end.

diff --git a/docs/tutorial/LangImpl7.html b/docs/tutorial/LangImpl7.html index 523d24fcde1..4ad3b9456bd 100644 --- a/docs/tutorial/LangImpl7.html +++ b/docs/tutorial/LangImpl7.html @@ -49,11 +49,11 @@ href="http://en.wikipedia.org/wiki/Functional_programming">functional programming language. In our journey, we learned some parsing techniques, how to build and represent an AST, how to build LLVM IR, and how to optimize -the resultant code and JIT compile it.

+the resultant code as well as JIT compile it.

-

While Kaleidoscope is interesting as a functional language, this makes it -"too easy" to generate LLVM IR for it. In particular, a functional language -makes it very easy to build LLVM IR directly in While Kaleidoscope is interesting as a functional language, the fact that it +is functional makes it "too easy" to generate LLVM IR for it. In particular, a +functional language makes it very easy to build LLVM IR directly in SSA form. Since LLVM requires that the input code be in SSA form, this is a very nice property and it is often unclear to newcomers how to generate code for an @@ -124,13 +124,13 @@ (cond_true/cond_false). In order to merge the incoming values, the X.2 phi node in the cond_next block selects the right value to use based on where control flow is coming from: if control flow comes from the cond_false block, X.2 gets -the value of X.1. Alternatively, if control flow comes from cond_tree, it gets +the value of X.1. Alternatively, if control flow comes from cond_true, it gets the value of X.0. The intent of this chapter is not to explain the details of SSA form. For more information, see one of the many online references.

-

The question for this article is "who places phi nodes when lowering +

The question for this article is "who places the phi nodes when lowering assignments to mutable variables?". The issue here is that LLVM requires that its IR be in SSA form: there is no "non-ssa" mode for it. However, SSA construction requires non-trivial algorithms and data structures, @@ -162,12 +162,12 @@

In LLVM, all memory accesses are explicit with load/store instructions, and -it is carefully designed to not have (or need) an "address-of" operator. Notice +it is carefully designed not to have (or need) an "address-of" operator. Notice how the type of the @G/@H global variables is actually "i32*" even though the variable is defined as "i32". What this means is that @G defines space for an i32 in the global data area, but its name actually refers to the -address for that space. Stack variables work the same way, but instead of being -declared with global variable definitions, they are declared with the +address for that space. Stack variables work the same way, except that instead of +being declared with global variable definitions, they are declared with the LLVM alloca instruction:

@@ -259,10 +259,10 @@
-

The mem2reg pass implements the standard "iterated dominator frontier" +

The mem2reg pass implements the standard "iterated dominance frontier" algorithm for constructing SSA form and has a number of optimizations that speed -up (very common) degenerate cases. mem2reg is the answer for dealing with -mutable variables, and we highly recommend that you depend on it. Note that +up (very common) degenerate cases. The mem2reg optimization pass is the answer to dealing +with mutable variables, and we highly recommend that you depend on it. Note that mem2reg only works on variables in certain circumstances:

    @@ -288,10 +288,10 @@

    All of these properties are easy to satisfy for most imperative languages, and -we'll illustrate this below with Kaleidoscope. The final question you may be +we'll illustrate it below with Kaleidoscope. The final question you may be asking is: should I bother with this nonsense for my front-end? Wouldn't it be better if I just did SSA construction directly, avoiding use of the mem2reg -optimization pass? In short, we strongly recommend that use you this technique +optimization pass? In short, we strongly recommend that you use this technique for building SSA form, unless there is an extremely good reason not to. Using this technique is:

    @@ -309,8 +309,8 @@
  1. Needed for debug info generation: Debug information in LLVM relies on having the address of the variable -exposed to attach debug info to it. This technique dovetails very naturally -with this style of debug info.
  2. +exposed so that debug info can be attached to it. This technique dovetails +very naturally with this style of debug info.

    If nothing else, this makes it much easier to get your front-end up and @@ -337,7 +337,7 @@

While the first item is really what this is about, we only have variables -for incoming arguments and for induction variables, and redefining those only +for incoming arguments as well as for induction variables, and redefining those only goes so far :). Also, the ability to define new variables is a useful thing regardless of whether you will be mutating them. Here's a motivating example that shows how we could use these:

@@ -403,8 +403,8 @@

To start our transformation of Kaleidoscope, we'll change the NamedValues -map to map to AllocaInst* instead of Value*. Once we do this, the C++ compiler -will tell use what parts of the code we need to update:

+map so that it maps to AllocaInst* instead of Value*. Once we do this, the C++ +compiler will tell us what parts of the code we need to update:

@@ -452,7 +452,7 @@
 
-

As you can see, this is pretty straight-forward. Next we need to update the +

As you can see, this is pretty straightforward. Now we need to update the things that define the variables to set up the alloca. We'll start with ForExprAST::Codegen (see the full code listing for the unabridged code):

@@ -518,7 +518,7 @@ argument. This method gets invoked by FunctionAST::Codegen right after it sets up the entry block for the function.

-

The final missing piece is adding the 'mem2reg' pass, which allows us to get +

The final missing piece is adding the mem2reg pass, which allows us to get good codegen once again:

@@ -537,7 +537,7 @@

It is interesting to see what the code looks like before and after the mem2reg optimization runs. For example, this is the before/after code for our -recursive fib. Before the optimization:

+recursive fib function. Before the optimization:

@@ -709,7 +709,7 @@
 
-

Once it has the variable, codegen'ing the assignment is straight-forward: +

Once we have the variable, codegen'ing the assignment is straightforward: we emit the RHS of the assignment, create a store, and return the computed value. Returning a value allows for chained assignments like "X = (Y = Z)".

@@ -799,7 +799,7 @@ the VarNames vector. Also, var/in has a body, this body is allowed to access the variables defined by the var/in.

-

With this ready, we can define the parser pieces. First thing we do is add +

With this in place, we can define the parser pieces. The first thing we do is add it as a primary expression:

@@ -972,7 +972,7 @@

With this, we completed what we set out to do. Our nice iterative fib example from the intro compiles and runs just fine. The mem2reg pass optimizes all of our stack variables into SSA registers, inserting PHI nodes where needed, -and our front-end remains simple: no iterated dominator frontier computation +and our front-end remains simple: no "iterated dominance frontier" computation anywhere in sight.

diff --git a/docs/tutorial/LangImpl8.html b/docs/tutorial/LangImpl8.html index d84dfdca6bd..855b8f3692b 100644 --- a/docs/tutorial/LangImpl8.html +++ b/docs/tutorial/LangImpl8.html @@ -98,7 +98,7 @@
  • standard runtime - Our current language allows the user to access arbitrary external functions, and we use it for things like "printd" and "putchard". As you extend the language to add higher-level constructs, often -these constructs make the most amount of sense to be lowered into calls into a +these constructs make the most sense if they are lowered to calls into a language-supplied runtime. For example, if you add hash tables to the language, it would probably make sense to add the routines to a runtime, instead of inlining them all the way.
  • @@ -106,14 +106,14 @@
  • memory management - Currently we can only access the stack in Kaleidoscope. It would also be useful to be able to allocate heap memory, either with calls to the standard libc malloc/free interface or with a garbage -collector. If you choose to use garbage collection, note that LLVM fully +collector. If you would like to use garbage collection, note that LLVM fully supports Accurate Garbage Collection including algorithms that move objects and need to scan/update the stack.
  • debugger support - LLVM supports generation of DWARF Debug info which is understood by common debuggers like GDB. Adding support for debug info is fairly -straight-forward. The best way to understand it is to compile some C/C++ code +straightforward. The best way to understand it is to compile some C/C++ code with "llvm-gcc -g -O0" and taking a look at what it produces.
  • exception handling support - LLVM supports generation of Have fun - try doing something crazy and unusual. Building a language like -everyone else always has is much less fun than trying something a little crazy -and off the wall and seeing how it turns out. If you get stuck or want to talk +everyone else always has, is much less fun than trying something a little crazy +or off the wall and seeing how it turns out. If you get stuck or want to talk about it, feel free to email the llvmdev mailing list: it has lots of people who are interested in languages and are often willing to help out.

    -

    Before we end, I want to talk about some "tips and tricks" for generating +

    Before we end this tutorial, I want to talk about some "tips and tricks" for generating LLVM IR. These are some of the more subtle things that may not be obvious, but are very useful if you want to take advantage of LLVM's capabilities.

  • -
    Properties of LLVM +
    Properties of the LLVM IR
    @@ -184,7 +184,7 @@ tell that the Kaleidoscope compiler generates target-independent code because it never queries for any target-specific information when generating code.

    -

    The fact that LLVM provides a compact target-independent representation for +

    The fact that LLVM provides a compact, target-independent, representation for code gets a lot of people excited. Unfortunately, these people are usually thinking about C or a language from the C family when they are asking questions about language portability. I say "unfortunately", because there is really no @@ -208,7 +208,7 @@

    While it is possible to engineer more and more complex solutions to problems -like this, it cannot be solved in full generality in a way better than shipping +like this, it cannot be solved in full generality in a way that is better than shipping the actual source code.

    That said, there are interesting subsets of C that can be made portable. If @@ -263,7 +263,7 @@ writing, there is no way to distinguish in the LLVM IR whether an SSA-value came from a C "int" or a C "long" on an ILP32 machine (other than debug info). Both get compiled down to an 'i32' value and the information about what it came from -is lost. The more general issue here is that the LLVM type system uses +is lost. The more general issue here, is that the LLVM type system uses "structural equivalence" instead of "name equivalence". Another place this surprises people is if you have two types in a high-level language that have the same structure (e.g. two different structs that have a single int field): these @@ -275,10 +275,10 @@ adding new features (LLVM did not always support exceptions or debug info), we also extend the IR to capture important information for optimization (e.g. whether an argument is sign or zero extended, information about pointers -aliasing, etc. Many of the enhancements are user-driven: people want LLVM to -do some specific feature, so they go ahead and extend it to do so.

    +aliasing, etc). Many of the enhancements are user-driven: people want LLVM to +include some specific feature, so they go ahead and extend it.

    -

    Third, it is possible and easy to add language-specific +

    Third, it is possible and easy to add language-specific optimizations, and you have a number of choices in how to do it. As one trivial example, it is easy to add language-specific optimization passes that "know" things about code compiled for a language. In the case of the C family, @@ -291,7 +291,7 @@ other language-specific information into the LLVM IR. If you have a specific need and run into a wall, please bring the topic up on the llvmdev list. At the very worst, you can always treat LLVM as if it were a "dumb code generator" and -implement the high-level optimizations you desire in your front-end on the +implement the high-level optimizations you desire in your front-end, on the language-specific AST.

    @@ -316,8 +316,8 @@
    -

    One interesting thing that comes up if you are trying to keep the code -generated by your compiler "target independent" is that you often need to know +

    One interesting thing that comes up, if you are trying to keep the code +generated by your compiler "target independent", is that you often need to know the size of some LLVM type or the offset of some field in an llvm structure. For example, you might need to pass the size of a type into a function that allocates memory.

    @@ -342,10 +342,10 @@ are often better ways to implement these features than explicit stack frames, but LLVM -does support them if you want. It requires your front-end to convert the +does support them, if you want. It requires your front-end to convert the code into Continuation -Passing Style and use of tail calls (which LLVM also supports).

    +Passing Style and the use of tail calls (which LLVM also supports).