Skip to content

Discriminated unions

wangp edited this page May 28, 2017 · 3 revisions

Discriminated unions allow the definition of new, structured types. They encompass both enumeration and record types in other languages.

Enumerations

The simplest example of a discriminated union type is an enumeration. Here is an example:

:- type fruit
    --->    apple
    ;       banana
    ;       cherry.

The definition begins with :- type followed by the type name fruit. Following the arrow (containing three dashes) is a list of one or more data constructors separated by semicolons. Each constructor represents a possible term of the type. This type definition makes it possible to write code involving terms apple, banana and cherry.

The compiler will reject code which tries to make use of constructors which were not defined or imported, preventing mistakes.

Constructors with arguments

Each data constructor in the type definition may have arguments.

Here is a type to represent binary trees of int values.

:- type tree
    --->    leaf(int)
    ;       branch(tree, int, tree).

Terms of the type tree can come in two variants, distinguished by the data constructor. Terms with leaf as the top-level constructor must have a single argument of type int. Terms with branch as the top-level constructor must have three arguments of the right types, including two sub-trees. A term matching the type is branch(leaf(1), 2, branch(leaf(3), 4, leaf(5))).

An enumeration is just a discriminated union where none of the constructors have arguments.

Polymophic types

Types may be parameterised with type variables. Type variables begin with an uppercase letter or underscore, just like program variables.

We can generalise the binary tree type like this:

:- type tree(T)
    --->    leaf(T)
    ;       branch(tree(T), T, tree(T)).

tree(T) is a polymorphic type. We can substitute any type for T to make more types. tree(string) is the type of binary trees of strings, tree(tree(int)) is the type of binary trees of binary trees of ints, and so on.

Ordinarily, type variables occurring in the body of the type definition must occur as arguments on the left side of the arrow. This is invalid:

:- type invalid_type
    --->    foo(T).

A term of type invalid_type must have a top-level constructor foo but what would be the type of the argument? We do not have a substitution for T, so that information would have to carried around with the term at runtime. It is possible using a feature called existentially quantified type variables, which deserves its own chapter.

Named fields

The fields of a data constructor may be labelled using the syntax FIELDNAME :: FIELDTYPE. One reason to label fields is to make the type definition more obvious.

:- type circle
    --->    circle(
                x       :: float,
                y       :: float,
                radius  :: float
            ).

Note that two fields in the same module cannot have the same name.

Field names allow you to select individual fields of a term using an expression of the form TERM ^ FIELDNAME. Actually, the right side is a list that can be chained, so TERM ^ FIELDNAME1 ^ ... ^ FIELDNAME can select a field in a nested term.

Radius = Circle ^ radius

achieves the same thing as

Circle = circle(_, _, Radius)

Field names also allow you to "update" individual fields of a term with an expression of the form TERM ^ FIELD_LIST := FIELDVALUE. The expression returns a copy of TERM with the value of the field specified replaced by FIELDVALUE.

UnitCircle = Circle ^ radius := 1.0,

achieves the same thing as

Circle = circle(X, Y, _),
UnitCircle = circle(X, Y, 1.0)

We will have more to say about named fields and record syntax after we introduce more of the language.

Overloading

A single type can have multiple constructors with the same name but different arities (number of arguments). Different types may have constructors of the same name and arity.

This is valid:

:- type foo1
    --->    foo
    ;       foo(int)
    ;       foo(int, int)
    ;       foo(int, int, int).

:- type foo2
    --->    foo
    ;       foo(int)
    ;       foo(int, int)
    ;       foo(int, int, int).

Excessive overloading of constructors can slow down type checking and can make the program confusing for human readers, so overloading should not be over-used.

Explicit type qualification

Due to overloading or polymorphism, sometimes it will be necessary to resolve ambiguities when a term could match multiple types using a type qualification expression of the form TERM : TYPE.

It usually arises when calling a predicate or function that can accept any type.

Examples

It might be instructive to have a look at some of the types defined in the standard library.

In the bool module we have a boolean type.

:- type bool
    --->    no
    ;       yes.

The constants are named yes and no to avoid confusion with the predicates true and false that exist in Mercury. It is often a good idea to use purpose-specific types instead of bool, as the meaning of a boolean parameter can be unclear, or confused with unrelated parameters.

In the maybe module we have a type for representing optional values.

:- type maybe(T)
    --->    no
    ;       yes(T).

The pair module defines a type to pair up two terms into a compound term, similar to a 2-tuple.

:- type pair(T1, T2)
    --->    (T1 - T2).

An example of a pair is "abc" - 123 where - is the infix constructor. The - symbol is easily confused for a minus operator so we would not recommend it in new code (it comes from Prolog, which doesn't have functions).

A very common type is list(T) defined in the list module.

:- type list(T)
    --->    []
    ;       [T | list(T)].

Because lists are so common, the parser supports some special syntax that makes working with lists more convenient. (The syntax can be used for other types, though it may be confusing.)

We can also write the list(T) type definition as:

:- type list(T)
    --->    '[]'
    ;       '[|]'(T, list(T)).

That might be a little clearer. A list is either the empty list [], or a non-empty list with a top-level constructor [|] with two arguments, the head of the list and the rest of the list. We will probably devote a chapter to working with lists.

Clone this wiki locally