Skip to content
Artyom Goncharov edited this page Jan 29, 2016 · 2 revisions

Declarations

There are 4 of them so far

Namespaces

There are two namespaces in Helium, one for types and one for variable and functions. Every time you create a function you define two new namespaces for types and for the rest. At the very top there is a global namespace where the main(and other functions and types) live in.

Function

To define a function you will write something like this:

fn sum(a: int, b: int): int
{
    ret a + b;
}

So, each function definition starts with fn keyword followed by alphanumeric name, then list of function parameters in parentheses. Each parameter MUST have a type. Function also has a type that goes after formal parameters and defines the type of function's result. The function body is enclosed in curly braces. To return a value from a function you will use ret statement followed by an expression.

Some parts of function definition can be omitted. The type of the function for example if omitted will be derived from the type of function result:

fn sum(a: int, b: int)
{
    ret a + b;
}

The two functions above are equal.

//TODO: The ret statement and ; at the function edge can be omitted making a hit to compiler that the last expression is the result of the function:

fn sum(a: int, b: int)
{
    a + b
}

If a function does not have any formal parameters the parentheses can be omitted, these two functions are equivalent:

fn blah() { 42 }
fn blah   { 42 }

Nesting

You can define a function inside a function, effectively creating a new namespace for types and variables:

def Point = { x: int, y: int }
fn first
{
    let a = Point { x = 10, y = 20 };

    fn blah { 42 }

    fn second
    {
        let a = Point { x = 100 };
        let b = Point { y = 200 };

        ret a.x + b.y + blah();
    }

    second()
}

In this example there are 3 namespace levels: global level, first function level and second function level. The function first opens a child namespace relative to global, this allows it to access type Point. Function second opens a child namespace relative to first function which allows it to access type Point and variable a from that level, but it also defines a new variable a, which shadows the original variable at first function level. Also it gets access to the function blah defined one level above, if you would define a function with the name blah again it would shadow the original one. At any time you will use the most recent definition of a symbol in program. You CANNOT define two variables or functions with the same name on the same level. You CANNOT access symbols from a sibling level or its children.

TBD You can call a function before declaring it:

fn main
{
    let a = sum(40, 2);
}
fn sum(a: int, b: int) { a + b }

Main

The Main function is the very special function. The main function is the entry point for the whole program. It has three very important differences from a regular one(thus far): it always returns an integer result, if no return value is provided it returns 0 and when the last statement(or expression) of the function is reached the program terminates. Here is the very minimal program:

fn main {}

Variable

There are two ways to define(declare) a variable: locally inside a function or globally outside of any function. Global variables reside in .data segment and are accessible as long as program runs. Local variables reside inside a function definition(technically either in a register or stack), they come to live when you execute a function and die when execution falls through function end.

Global variables if not provided with a initialization value are initialized to a default value based on their type. Local variables are always uninitialized unless privided with initialization expression.

Local

You declare a local variable like this:

fn main
{
    let a: int;
}

Here variable declaration starts with the keyword let followed by alphanumeric variable name, followed by : and integral type name int ended with ; to close the statement. The symbol a is an uninitialized variable of type int, which means its value is garbage(thing that was on stack or in register before entering this function). To initialize a variable you need to provide an initialization expression:

fn main
{
    let a: int = 10;
}

Here the variable a is initialized with an integer literal expression 10 which is of the similar (actually the same in this case) type as the variable a which brings cool posibility of local type inferring that allows us to reduce tautology in code by omitting types in places where it can be inferred by the compiler:

fn main
{
    let a = 10;
}

The code above is the same as the previous one but with variable type omitted and inferred from the initialization expression. Initialization expression is virtually any expression that yields a value.

Global

Not implemented, tbd

Type

Type declaration is the way to assign a name to another type, most of the type it will be anonymous record type:

def Point = { x:int, y:int }

Here record type definition is started by keyword def followed by alphanumeric type name, then binding symbol = followed by list of typed names enclosed in braces. Each name in compound type MUST have a type. Typed names order does not really matter but it does define the actual name of the type within its namespace.

Unlike variables there is no special handling for global types they are just top level symbols. But they follow general namespace rules outlined at the top of the page.

Asm

(currently in progress) Inline assembly allows you to embed assembler instructions within Helium code. There are two forms of asm statement. Simple form allows you to write direct assembler statements, define labels, use registers etc. At the top level of your program you can use only this form. Extended form allows you to use advanced features such as meta labels and registers, variable interpolation, function declaration(TBD), Helium expression interpolation(TBD), function calls in both ways.

Simple

Simple form is simple, it is plain assembler instructions:

asm
{
    addi $t0, $t1, 42
    li   $t0, 42
}

In this example there are two equivalent instructions the first one is a real MIPS instruction, the second is a pseudo-instruction(or macro) that will expand into several real instructions. At this point you cannot do much with the simple version of asm since no data definitions allowed yet.

Extended

Meta Symbols

Direct usage(naming) of registers and labels inside asm that is itself declared inside a function is a BAD idea. Using exact registers names greatly restricts register allocation pass that can lead to unneeded spills and slowing down the program, using explicit labels might(and probably will) lead to label collision. Solution for this problem is to use meta symbols, from programmer's perspective nothing much changed but for compiler there is no more restriction on what rester to use and what label name to generate.

Meta Register

Meta register is an alphanumeric name preceded with backtick `:

asm
{
    addi `t0, `t1, 42
    li   `t0, 42
}

It is the same example as above but all $ symbols replaced with backticks allowing compiler to use any registers available for the statement. Meta registers with same name will receive the same real register no matter what. Meta names are not limited of course to the real registers names:

asm
{
    addi `banana, `blah, 42
    li   `banana, 42
}
Meta Label

Meta label is an alphanumeric name preceded with two backticks:

asm
{
        addi  `cnt, $0, 0
        addi  `tst, $0, 5
    ``repeat: 
        addi  `cnt, `cnt, 1
        bne   `cnt, `tst, ``repeat
}

In this example we define a counter `cnt and test value in `tst. On each iteration we increment the counter by 1 and check whether it became equal to test value if not we branch to repeat label and if we did reach the value we fall through to exit. Label repeat is a meta label, in the actual code compiler will use generated name like L1, L2, L3 etc.

Variable Interpolation

This feature allows you to use Helium variables instead of registers or meta registers inside inline assembly statement, here is previous example with small changes:

fn main
{
    let tst: int = 5;
    let cnt: int = 0;
    asm
    {
        ``repeat: 
            addi  cnt, cnt, 1
            bne   cnt, tst, ``repeat
    }
}

Here is the same example as above but instead of registers used as counter and test we use Helium variables. Variable interpolation is not limited to simple variables you can use also records and array subscript(TBD):

def Blah = { tst: int, cnt: int }
fn main
{
    let b = Blah { tst = 10 };
    asm
    {
        ``repeat: 
            addi  b.cnt, b.cnt, 1
            bne   b.cnt, b.tst, ``repeat
    }
}

Literal interpolation

(curently in development) Some instructions are(will be) allowed to use Helium literal interpolation to simplify definition of data, for example la macro that loads a 32-bit value into a register, it also accepts a label as argument that will be replaced with a 32-bit address. Inline assembly makes this macro also accept a Helium literal, for example string:

fn main
{
    let len:int;
    asm
    {
        la `str, "Hello, World!"
        lw len, 0(`str)
    }
}

This code will spawn a length-prefixed string into the .data segment and replace its occurrence with a generated label name, the second load instruction simply reads string's length into len variable.

Current limitations

  • all statements are limited to instructions and labels now
  • all further extensions to the asm will be limited to SPIM simulator ISA since it is my test bench, at least for now
  • there is no distinguish between assembler ISA sets, compiler will accept any instructions defined in instruction table I took from GNU binutils.