title | filename | chapternum |
---|---|---|
Defining Computation |
lec_03_computation |
3 |
- See that computation can be precisely modeled. \
- Learn the computational model of Boolean circuits / straight-line programs.
- Equivalence of circuits and straight-line programs.
- Equivalence of AND/OR/NOT and NAND.
- Examples of computing in the physical world. \
"there is no reason why mental as well as bodily labor should not be economized by the aid of machinery", Charles Babbage, 1852
"If, unwarned by my example, any man shall undertake and shall succeed in constructing an engine embodying in itself the whole of the executive department of mathematical analysis upon different principles or by simpler mechanical means, I have no fear of leaving my reputation in his charge, for he alone will be fully able to appreciate the nature of my efforts and the value of their results.", Charles Babbage, 1864
"To understand a program you must become both the machine and the program.", Alan Perlis, 1982
People have been computing for thousands of years, with aids that include not just pen and paper, but also abacus, slide rules, various mechanical devices, and modern electronic computers.
A priori, the notion of computation seems to be tied to the particular mechanism that you use.
You might think that the "best" algorithm for multiplying numbers will differ if you implement it in Python on a modern laptop than if you use pen and paper.
However, as we saw in the introduction (chapintro{.ref}), an algorithm that is asymptotically better would eventually beat a worse one regardless of the underlying technology.
This gives us hope for a technology independent way of defining computation.
This is what we do in this chapter.
We will define the notion of computing an output from an input by applying a sequence of basic operations (see compchapwhatvshowfig{.ref}).
Using this, we will be able to precisely define statements such as "function
::: {.nonmath} The main takeaways from this chapter are:
-
We can use logical operations such as
$AND$ ,$OR$ , and$NOT$ to compute an output from an input (see andornotsec{.ref}). -
A Boolean circuit is a way to compose the basic logical operations to compute a more complex function (see booleancircuitsec{.ref}). We can think of Boolean circuits as both a mathematical model (which is based on directed acyclic graphs) as well as physical devices we can construct in the real world in a variety of ways, including not just silicon-based semi-conductors but also mechanical and even biological mechanisms (see physicalimplementationsec{.ref}).
-
We can describe Boolean circuits also as straight-line programs, which are programs that do not have any looping constructs (i.e., no
while
/for
/do .. until
etc.), see starightlineprogramsec{.ref}. -
It is possible to implement the
$AND$ ,$OR$ , and$NOT$ operations using the$NAND$ operation (as well as vice versa). This means that circuits with$AND$ /$OR$/$NOT$ gates can compute the same functions (i.e., are equivalent in power) to circuits with$NAND$ gates, and we can use either model to describe computation based on our convenience, see nandsec{.ref}. To give out a "spoiler", we will see in finiteuniversalchap{.ref} that such circuits can compute all finite functions.
One "big idea" of this chapter is the notion of equivalence between models (equivalencemodels{.ref}). Two computational models are equivalent if they can compute the same set of functions. Boolean circuits with
The name "algorithm" is derived from the Latin transliteration of Muhammad ibn Musa al-Khwarizmi's name.
Al-Khwarizmi was a Persian scholar during the 9th century whose books introduced the western world to the decimal positional numeral system, as well as to the solutions of linear and quadratic equations (see alKhwarizmi{.ref}).
However Al-Khwarizmi's descriptions of algorithms were rather informal by today's standards.
Rather than use "variables" such as
Here is how Al-Khwarizmi described the algorithm for solving an equation of the form
[How to solve an equation of the form ] "roots and squares are equal to numbers": For instance "one square, and ten roots of the same, amount to thirty-nine dirhems" that is to say, what must be the square which, when increased by ten of its own root, amounts to thirty-nine? The solution is this: you halve the number of the roots, which in the present instance yields five. This you multiply by itself; the product is twenty-five. Add this to thirty-nine' the sum is sixty-four. Now take the root of this, which is eight, and subtract from it half the number of roots, which is five; the remainder is three. This is the root of the square which you sought for; the square itself is nine.
For the purposes of this book, we will need a much more precise way to describe algorithms. Fortunately (or is it unfortunately?), at least at the moment, computers lag far behind school-age children in learning from examples. Hence in the 20th century, people came up with exact formalisms for describing algorithms, namely programming languages. Here is al-Khwarizmi's quadratic equation solving algorithm described in the Python programming language:
from math import sqrt
#Pythonspeak to enable use of the sqrt function to compute square roots.
def solve_eq(b,c):
# return solution of x^2 + bx = c following Al Khwarizmi's instructions
# Al Kwarizmi demonstrates this for the case b=10 and c= 39
val1 = b / 2.0 # "halve the number of the roots"
val2 = val1 * val1 # "this you multiply by itself"
val3 = val2 + c # "Add this to thirty-nine"
val4 = sqrt(val3) # "take the root of this"
val5 = val4 - val1 # "subtract from it half the number of roots"
return val5 # "This is the root of the square which you sought for"
# Test: solve x^2 + 10*x = 39
print(solve_eq(10,39))
# 3.0
We can define algorithms informally as follows:
::: {.quote } Informal definition of an algorithm: An algorithm is a set of instructions for how to compute an output from an input by following a sequence of "elementary steps".
An algorithm
In this chapter we will make this informal definition precise using the model of Boolean Circuits. We will show that Boolean Circuits are equivalent in power to straight line programs that are written in "ultra simple" programming languages that do not even have loops. We will also see that the particular choice of elementary operations is immaterial and many different choices yield models with equivalent power (see compchapoverviewfig{.ref}). However, it will take us some time to get there. We will start by discussing what are "elementary operations" and how we map a description of an algorithm into an actual physical process that produces an output from an input in the real world.
An algorithm breaks down a complex calculation into a series of simpler steps. These steps can be executed in a variety of different ways, including:
-
Writing down symbols on a piece of paper.
-
Modifying the current flowing on electrical wires.
-
Binding a protein to a strand of DNA.
-
Responding to a stimulus by a member of a collection (e.g., a bee in a colony, a trader in a market).
To formally define algorithms, let us try to "err on the side of simplicity" and model our "basic steps" as truly minimal. For example, here are some very simple functions:
-
$OR:{0,1}^2 \rightarrow {0,1}$ defined as
-
$AND:{0,1}^2 \rightarrow {0,1}$ defined as
-
$NOT:{0,1} \rightarrow {0,1}$ defined as
The functions
Each one of the functions
::: {.example title="Majority from
That is, for every
Let us first try to rephrase
Recall that we can also write
We can also write eqmajandornot{.eqref} in a "programming language" form, expressing it as a set of instructions for computing
def MAJ(X[0],X[1],X[2]):
firstpair = AND(X[0],X[1])
secondpair = AND(X[1],X[2])
thirdpair = AND(X[0],X[2])
temp = OR(secondpair,thirdpair)
return OR(firstpair,temp)
:::
<iframe src="https://trinket.io/embed/python/5ead2eab1b" width="100%" height="600" frameborder="0" marginwidth="0" marginheight="0" allowfullscreen></iframe>Like standard addition and multiplication, the functions
::: {.solvedexercise title="Distributive law for AND and OR" #distributivelaw}
Prove that for every
::: {.solution data-ref="distributivelaw"}
We can prove this by enumerating over all the
Let us see how we can obtain a different function from the same building blocks.
Define
::: { .pause }
As usual, it is a good exercise to try to work out the algorithm for
The following algorithm computes
Input: $a,b \in \{0,1\}$.
Output: $XOR(a,b)$
$w1 \leftarrow AND(a,b)$
$w2 \leftarrow NOT(w1)$
$w3 \leftarrow OR(a,b)$
return $AND(w2,w3)$
For every
::: {.proof data-ref="alganalaysis"}
For every
-
If
$a=b=0$ then$w3=OR(a,b)=0$ and so the output will be$0$ . -
If
$a=b=1$ then$AND(a,b)=1$ and so$w2=NOT(AND(a,b))=0$ and the output will be$0$ . -
If
$a=1$ and$b=0$ (or vice versa) then both$w3=OR(a,b)=1$ and$w1=AND(a,b)=0$ , in which case the algorithm will output$OR(NOT(w1),w3)=1$ . :::
We can also express XORfromAONalg{.ref} using a programming language.
Specifically, the following is a Python program that computes the
def AND(a,b): return a*b
def OR(a,b): return 1-(1-a)*(1-b)
def NOT(a): return 1-a
def XOR(a,b):
w1 = AND(a,b)
w2 = NOT(w1)
w3 = OR(a,b)
return AND(w2,w3)
# Test out the code
print([f"XOR({a},{b})={XOR(a,b)}" for a in [0,1] for b in [0,1]])
# ['XOR(0,0)=0', 'XOR(0,1)=1', 'XOR(1,0)=1', 'XOR(1,1)=0']
::: {.solvedexercise title="Compute
::: {.solution data-ref="xorthreebits"}
Addition modulo two satisfies the same properties of associativity ($(a+b)+c=a+(b+c)$) and commutativity (
Since we know how to compute
def XOR3(a,b,c):
w1 = AND(a,b)
w2 = NOT(w1)
w3 = OR(a,b)
w4 = AND(w2,w3)
w5 = AND(w4,c)
w6 = NOT(w5)
w7 = OR(w4,c)
return AND(w6,w7)
# Let's test this out
print([f"XOR3({a},{b},{c})={XOR3(a,b,c)}" for a in [0,1] for b in [0,1] for c in [0,1]])
# ['XOR3(0,0,0)=0', 'XOR3(0,0,1)=1', 'XOR3(0,1,0)=1', 'XOR3(0,1,1)=0', 'XOR3(1,0,0)=1', 'XOR3(1,0,1)=0', 'XOR3(1,1,0)=0', 'XOR3(1,1,1)=1']
:::
Try to generalize the above examples to obtain a way to compute
We have seen that we can obtain at least some examples of interesting functions by composing together applications of
::: {.quote}
Semi-formal definition of an algorithm: An algorithm consists of a sequence of steps of the form "compute a new value by applying
An algorithm
There are several concerns that are raised by this definition:
-
First and foremost, this definition is indeed too informal. We do not specify exactly what each step does, nor what it means to "feed
$x$ as input". -
Second, the choice of
$AND$ ,$OR$ or$NOT$ seems rather arbitrary. Why not$XOR$ and$MAJ$ ? Why not allow operations like addition and multiplication? What about any other logical constructions suchif
/then
orwhile
? -
Third, do we even know that this definition has anything to do with actual computing? If someone gave us a description of such an algorithm, could we use it to actually compute the function in the real world?
These concerns will to a large extent guide us in the upcoming chapters. Thus you would be well advised to re-read the above informal definition and see what you think about these issues.
A large part of this book will be devoted to addressing the above issues. We will see that:
-
We can make the definition of an algorithm fully formal, and so give a precise mathematical meaning to statements such as "Algorithm
$A$ computes function$f$ ". -
While the choice of
$AND$ /$OR$/$NOT$ is arbitrary, and we could just as well have chosen other functions, we will also see this choice does not matter much. We will see that we would obtain the same computational power if we instead used addition and multiplication, and essentially every other operation that could be reasonably thought of as a basic step. -
It turns out that we can and do compute such "$AND$/$OR$/$NOT$-based algorithms" in the real world. First of all, such an algorithm is clearly well specified, and so can be executed by a human with a pen and paper. Second, there are a variety of ways to mechanize this computation. We've already seen that we can write Python code that corresponds to following such a list of instructions. But in fact we can directly implement operations such as
$AND$ ,$OR$ , and$NOT$ via electronic signals using components known as transistors. This is how modern electronic computers operate.
In the remainder of this chapter, and the rest of this book, we will begin to answer some of these questions.
We will see more examples of the power of simple operations to compute more complex operations including addition, multiplication, sorting and more.
We will also discuss how to physically implement simple operations such as
{#smallandornotcircxorfig .margin }
Boolean circuits provide a precise notion of "composing basic operations together".
A Boolean circuit (see boolancircfig{.ref}) is composed of gates and inputs that are connected by wires.
The wires carry a signal that represents either the value
::: {.remark title="Physical realization of Boolean circuits" #booleancircimprem}
Boolean circuits are a mathematical model that does not necessarily correspond to a physical object, but they can be implemented physically.
In physical implementations of circuits, the signal is often implemented by electric potential, or voltage, on a wire, where for example voltage above a certain level is interpreted as a logical value of
::: {.solvedexercise title="All equal function" #allequalex}
Define
::: {.solution data-ref="allequalex"}
Another way to describe the function
We defined Boolean circuits informally as obtained by connecting AND, OR, and NOT gates via wires so as to produce an output from an input. However, to be able to prove theorems about the existence or non-existence of Boolean circuits for computing various functions we need to:
-
Formally define a Boolean circuit as a mathematical object.
-
Formally define what it means for a circuit
$C$ to compute a function$f$ .
We now proceed to do so.
We will define a Boolean circuit as a labeled Directed Acyclic Graph (DAG).
The vertices of the graph correspond to the gates and inputs of the circuit, and the edges of the graph correspond to the wires.
A wire from an input or gate
::: {.definition title="Boolean Circuits" #booleancircdef}
Let
-
Exactly
$n$ of the vertices have no in-neighbors. These vertices are known as inputs and are labeled with the$n$ labelsX[
$0$]</code>, $\ldots$, <code>X[$ n-1$]
. Each input has at least one out-neighbor. -
The other
$s$ vertices are known as gates. Each gate is labeled with$\wedge$ ,$\vee$ or$\neg$ . Gates labeled with$\wedge$ (AND) or$\vee$ (OR) have two in-neighbors. Gates labeled with$\neg$ (NOT) have one in-neighbor. We will allow parallel edges.^[Having parallel edges means an AND or OR gate$u$ can have both its in-neighbors be the same gate$v$ . Since$AND(a,a)=OR(a,a)=a$ for every$a\in {0,1}$ , such parallel edges don't help in computing new values in circuits with AND/OR/NOT gates. However, we will see circuits with more general sets of gates later on.] -
Exactly
$m$ of the gates are also labeled with the$m$ labelsY[
$0$]</code>, $\ldots$, <code>Y[$ m-1$]
(in addition to their label$\wedge$ /$\vee$/$\neg$). These are known as outputs.
The size of a Boolean circuit is the number
::: { .pause } This is a non-trivial mathematical definition, so it is worth taking the time to read it slowly and carefully. As in all mathematical definitions, we are using a known mathematical object --- a directed acyclic graph (DAG) --- to define a new object, a Boolean circuit. This might be a good time to review some of the basic properties of DAGs and in particular the fact that they can be topologically sorted, see topsortsec{.ref}. :::
If X[
$0]
the values
::: {.definition title="Computing a function via a Boolean circuit" #circuitcomputedef}
Let
We let
-
For every
$v$ in the$\ell$ -th layer (i.e.,$v$ such that$h(v)=\ell$ ) do:-
If
$v$ is an input vertex labeled withX[
$i$ ]
for some$i\in [n]$ , then we assign to$v$ the value$x_i$ . -
If
$v$ is a gate vertex labeled with$\wedge$ and with two in-neighbors$u,w$ then we assign to$v$ the AND of the values assigned to$u$ and$w$ . (Since$u$ and$w$ are in-neighbors of$v$ , they are in a lower layer than$v$ , and hence their values have already been assigned.) -
If
$v$ is a gate vertex labeled with$\vee$ and with two in-neighbors$u,w$ then we assign to$v$ the OR of the values assigned to$u$ and$w$ . -
If
$v$ is a gate vertex labeled with$\neg$ and with one in-neighbor$u$ then we assign to$v$ the negation of the value assigned to$u$ .
-
-
The result of this process is the value
$y\in {0,1}^m$ such that for every$j\in [m]$ ,$y_j$ is the value assigned to the vertex with labelY[
$j$ ]
.
Let
::: {.remark title="Boolean circuits nitpicks (optional)" #booleancircuitsremarks}
In phrasing booleancircdef{.ref}, we've made some technical choices that are not very important, but will be convenient for us later on.
Having parallel edges means an AND or OR gate
We have seen two ways to describe how to compute a function
-
A Boolean circuit, defined in booleancircdef{.ref}, computes
$f$ by connecting via wires AND, OR, and NOT gates to the inputs. -
We can also describe such a computation using a straight-line program that has lines of the form
foo = AND(bar,blah)
,foo = OR(bar,blah)
andfoo = NOT(bar)
wherefoo
,bar
andblah
are variable names. (We call this a straight-line program since it contains no loops or branching (e.g., if/then) statements.)
To make the second definition more precise, we will now define a programming language that is equivalent to Boolean circuits. We call this programming language the AON-CIRC programming language ("AON" stands for AND/OR/NOT; "CIRC" stands for circuit).
For example, the following is an AON-CIRC program that on input
temp = AND(X[0],X[1])
Y[0] = NOT(temp)
AON-CIRC is not a practical programming language: it was designed for pedagogical purposes only, as a way to model computation as the composition of
Given this example, you might already be able to guess how to write a program for computing (for example)
An AON-CIRC program is a sequence of strings, which we call "lines", satisfying the following conditions:
-
Every line has one of the following forms:
foo = AND(bar,baz)
,foo = OR(bar,baz)
, orfoo = NOT(bar)
wherefoo
,bar
andbaz
are variable identifiers. (We follow the common programming languages convention of using names such asfoo
,bar
,baz
as stand-ins for generic identifiers.) The linefoo = AND(bar,baz)
corresponds to the operation of assigning to the variablefoo
the logical AND of the values of the variablesbar
andbaz
. Similarlyfoo = OR(bar,baz)
andfoo = NOT(bar)
correspond to the logical OR and logical NOT operations. -
A variable identifier in the AON-CIRC programming language can be any combination of letters, numbers, underscores, and brackets. There are two special types of variables:
- Variables of the form
X[
$i$ ]
, with$i \in {0,1,\ldots, n-1}$ are known as input variables. - Variables of the form
Y[
$j$ ]
are known as output variables.
- Variables of the form
-
A valid AON-CIRC program
$P$ includes input variables of the formX[
$0$]</code>,$\ldots$,<code>X[$ n-1$]</code> and output variables of the form <code>Y[$ 0$]</code>,$\ldots$, <code>Y[$ m-1$]
where$n,m$ are natural numbers. We say that$n$ is the number of inputs of the program$P$ and$m$ is the number of outputs. -
In a valid AON-CIRC program, in every line the variables on the right-hand side of the assignment operator must either be input variables or variables that have already been assigned a value in a previous line.
-
If
$P$ is a valid AON-CIRC program of$n$ inputs and$m$ outputs, then for every$x\in {0,1}^n$ the output of$P$ on input$x$ is the string$y\in {0,1}^m$ defined as follows:- Initialize the input variables
X[
$0$]</code>,$\ldots$,<code>X[$ n-1$]
to the values$x_0,\ldots,x_{n-1}$ - Run the operator lines of
$P$ one by one in order, in each line assigning to the variable on the left-hand side of the assignment operators the value of the operation on the right-hand side. - Let
$y\in {0,1}^m$ be the values of the output variablesY[
$0$]</code>,$\ldots$, <code>Y[$ m-1$]
at the end of the execution.
- Initialize the input variables
-
We denote the output of
$P$ on input$x$ by$P(x)$ . -
The size of an AON circ program
$P$ is the number of lines it contains. (The reader might note that this corresponds to our definition of the size of a circuit as the number of gates it contains.)
Now that we formally specified AON-CIRC programs, we can define what it means for an AON-CIRC program
::: {.definition title="Computing a function via AON-CIRC programs" #AONcircdef}
Let
The following solved exercise gives an example of an AON-CIRC program.
::: {.solvedexercise title="" #aonforcmpsolved}
Consider the following function
Write an AON-CIRC program to compute
::: {.solution data-ref="aonforcmpsolved"}
Writing such a program is tedious but not truly hard.
To compare two numbers we first compare their most significant digit, and then go down to the next digit and so on and so forth.
In this case where the numbers have just two binary digits, these comparisons are particularly simple.
The number represented by
- The most significant bit
$a$ of$(a,b)$ is larger than the most significant bit$c$ of$(c,d)$ .
or
- The two most significant bits
$a$ and$c$ are equal, but$b>d$ .
Another way to express the same condition is the following:
the number
For binary digits
# Compute CMP:{0,1}^4-->{0,1}
# CMP(X)=1 iff 2X[0]+X[1] > 2X[2] + X[3]
temp_1 = NOT(X[2])
temp_2 = AND(X[0],temp_1)
temp_3 = OR(X[0],temp_1)
temp_4 = NOT(X[3])
temp_5 = AND(X[1],temp_4)
temp_6 = AND(temp_5,temp_3)
Y[0] = OR(temp_2,temp_6)
We can also present this 8-line program as a circuit with 8 gates, see aoncmpfig{.ref}. :::
We now formally prove that AON-CIRC programs and Boolean circuits have exactly the same power:
Let
The idea is simple - AON-CIRC programs and Boolean circuits are just different ways of describing the exact same computational process.
For example, an AND gate in a Boolean circuit corresponds to computing the AND of two previously-computed values.
In an AON-CIRC program this will correspond to the line that stores in a variable the AND
of two previously-computed variables.
::: { .pause }
This proof of slcircuitequivthm{.ref} is simple at heart, but all the details it contains can make it a little cumbersome to read. You might be better off trying to work it out yourself before reading it.
Our GitHub repository contains a "proof by Python" of slcircuitequivthm{.ref}: implementation of functions circuit2prog
and prog2circuits
mapping Boolean circuits to AON-CIRC programs
and vice versa.
:::
::: {.proof data-ref="slcircuitequivthm"}
Let
We start with the first direction. Let foo = AND(bar,blah)
then the bar
and blah
(respectively) were written to. (For example, if bar
was written to is blah
was written to is bar
or blah
is an input variable then we connect the gate to the corresponding input vertex instead.
If foo
is an output variable of the form Y[
]
then we add the same label to the corresponding gate to mark it as an output gate.
We do the analogous operations if the OR
or a NOT
operation (except that we use the corresponding OR or NOT gate, and in the latter case have only one in-neighbor instead of two).
For every input
For the other direction, let temp_
= AND(temp_
$j)
, unless one of the vertices is an input vertex or an output gate, in which case we change this to the form X[.]
or Y[.]
appropriately.
Because we work in topological ordering, we are guaranteed that the in-neighbors X[0]
, X[
$n-1]
will appear in the program
Computation is an abstract notion that is distinct from its physical implementations. While most modern computing devices are obtained by mapping logical gates to semiconductor-based transistors, throughout history people have computed using a huge variety of mechanisms, including mechanical systems, gas and liquid (known as fluidics), biological and chemical processes, and even living creatures (e.g., see crabfig{.ref} or this video for how crabs or slime mold can be used to do computations).
In this section we will review some of these implementations, both so you can get an appreciation of how it is possible to directly translate Boolean circuits to the physical world, without going through the entire stack of architecture, operating systems, and compilers, as well as to emphasize that silicon-based processors are by no means the only way to perform computation. Indeed, as we will see in quantumchap{.ref}, a very exciting recent line of work involves using different media for computation that would allow us to take advantage of quantum mechanical effects to enable different types of algorithms.
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>Such a cool way to explain logic gates. pic.twitter.com/6Wgu2ZKFCx
— Lionel Page (\@page_eco) October 28, 2019
A transistor can be thought of as an electric circuit with two inputs, known as the source and the gate and an output, known as the sink. The gate controls whether current flows from the source to the sink. In a standard transistor, if the gate is "ON" then current can flow from the source to the sink and if it is "OFF" then it can't. In a complementary transistor this is reversed: if the gate is "OFF" then current can flow from the source to the sink and if it is "ON" then it can't.
{#transistor-water-fig .margin }
There are several ways to implement the logic of a transistor. For example, we can use faucets to implement it using water pressure (e.g. transistor-water-fig{.ref}). This might seem as merely a curiosity, but there is a field known as fluidics concerned with implementing logical operations using liquids or gasses. Some of the motivations include operating in extreme environmental conditions such as in space or a battlefield, where standard electronic equipment would not survive.
The standard implementations of transistors use electrical current. One of the original implementations used vacuum tubes. As its name implies, a vacuum tube is a tube containing nothing (i.e., a vacuum) and where a priori electrons could freely flow from the source (a wire) to the sink (a plate). However, there is a gate (a grid) between the two, where modulating its voltage can block the flow of electrons.
Early vacuum tubes were roughly the size of lightbulbs (and looked very much like them too). In the 1950's they were supplanted by transistors, which implement the same logic using semiconductors which are materials that normally do not conduct electricity but whose conductivity can be modified and controlled by inserting impurities ("doping") and applying an external electric field (this is known as the field effect). In the 1960's computers started to be implemented using integrated circuits which enabled much greater density. In 1965, Gordon Moore predicted that the number of transistors per integrated circuit would double every year (see moorefig{.ref}), and that this would lead to "such wonders as home computers —or at least terminals connected to a central computer— automatic controls for automobiles, and personal portable communications equipment". Since then, (adjusted versions of) this so-called "Moore's law" have been running strong, though exponential growth cannot be sustained forever, and some physical limitations are already becoming apparent.
We can use transistors to implement various Boolean functions such as
{#logicgatestransistorsfig .margin }
{#transistor-nand-fig .margin }
Computation can be based on biological or chemical systems.
For example the lac operon produces the enzymes needed to digest lactose only if the conditions
Cellular automata is a model of a system composed of a sequence of cells, each of which can have a finite state. At each step, a cell updates its state based on the states of its neighboring cells and some simple rules. As we will discuss later in this book (see cellularautomatasec{.ref}), cellular automata such as Conway's "Game of Life" can be used to simulate computation gates.
One computation device that we all carry with us is our own brain. Brains have served humanity throughout history, doing computations that range from distinguishing prey from predators, through making scientific discoveries and artistic masterpieces, to composing witty 280 character messages. The exact working of the brain is still not fully understood, but one common mathematical model for it is a (very large) neural network.
A neural network can be thought of as a Boolean circuit that instead of
Many machine learning algorithms use artificial neural networks whose purpose is not to imitate biology but rather to perform some computational tasks, and hence are not restricted to a threshold or other biologically-inspired gates.
Generally, a neural network is often described as operating on signals that are real numbers, rather than
{#activationfunctionsfig .margin }
We can implement computation using many other physical media, without any electronic, biological, or chemical components. Many suggestions for mechanical computers have been put forward, going back at least to Gottfried Leibniz's computing machines from the 1670s and Charles Babbage's 1837 plan for a mechanical "Analytical Engine".
As one example, marblefig{.ref} shows a simple implementation of a NAND (negation of AND, see nandsec{.ref}) gate using marbles going through pipes. We represent a logical value in
The
As its name implies,
We can compute
We start with the following observation. For every
univnandonethm{.ref}'s proof is very simple, but you should make sure that (i) you understand the statement of the theorem, and (ii) you follow its proof. In particular, you should make sure you understand why De Morgan's law is true.
We can use
Let
::: {.solution data-ref="majbynandex"} Recall that eqmajandornot{.eqref} states that
We can use univnandonethm{.ref} to replace all the occurrences of
The same formula can also be expressed as a circuit with NAND gates, see majnandcircfig{.ref}. :::
We define NAND Circuits as circuits in which all the gates are NAND operations. Such a circuit again corresponds to a directed acyclic graph (DAG) since all the gates correspond to the same function (i.e., NAND), we do not even need to label them, and all gates have in-degree exactly two. Despite their simplicity, NAND circuits can be quite powerful.
::: {.example title="$NAND$ circuit for
- Let
$u = NAND(x_0,x_1)$ . - Let
$v = NAND(x_0,u)$ - Let
$w = NAND(x_1,u)$ . - The
$XOR$ of$x_0$ and$x_1$ is$y_0 = NAND(v,w)$ .
One can verify that this algorithm does indeed compute
In fact, we can show the following theorem:
For every Boolean circuit
The idea of the proof is to just replace every
::: {.proof data-ref="NANDuniversamthm"}
If
-
$NOT(a) = NAND(a,a)$ -
$AND(a,b) = NAND(NAND(a,b),NAND(a,b))$ -
$OR(a,b) = NAND(NAND(a,a),NAND(b,b))$
we can replace every gate of
::: { .bigidea #equivalencemodels } Two models are equivalent in power if they can be used to compute the same set of functions. :::
Here are some more sophisticated examples of NAND circuits:
Incrementing integers. Consider the task of computing, given as input a string
The increment operation can be very informally described as follows: "Add $1$ to the least significant bit and propagate the carry".
A little more precisely, in the case of the binary representation, to obtain the increment of
Thus we can compute the increment of
INPUT: $x_0,x_1,\ldots,x_{n-1}$ representing the number $\sum_{i=0}^{n-1} x_i\cdot 2^i$ # we use LSB-first representation
OUTPUT:$y \in \{0,1\}^{n+1}$ such that $\sum_{i=0}^n y_i \cdot 2^i = \sum_{i=0}^{n-1} x_i\cdot 2^i + 1$
Let $c_0 \leftarrow 1$ # we pretend we have a "carry" of $1$ initially
For{$i=0,\ldots, n-1$}
Let $y_i \leftarrow XOR(x_i,c_i)$.
If{$c_i=x_i=1$}
$c_{i+1}=1$
else
$c_{i+1}=0$
endif
Endfor
Let $y_n \leftarrow c_n$.
incrementalg{.ref} describes precisely how to compute the increment operation, and can be easily transformed into Python code that performs the same computation, but it does not seem to directly yield a NAND circuit to compute this.
However, we can transform this algorithm line by line to a NAND circuit.
For example, since for every
{#nandincrememntcircfig .margin }
From increment to addition.
Once we have the increment operation, we can certainly compute addition by repeatedly incrementing (i.e., compute
INPUT: $u \in \{0,1\}^n$, $v\in \{0,1\}^n$ representing numbers in LSB-first binary representation.
OUTPUT: LSB-first binary representation of $x+y$.
Let $c_0 \leftarrow 0$
For{$i=0,\ldots,n-1$}
Let $y_i \leftarrow u_i + v_i \mod 2$
If{$u_i + v_i + c_i \geq 2$}
$c_{i+1}\leftarrow 1$
else
$c_{i+1} \leftarrow 0$
endif
Endfor
Let $y_n \leftarrow c_n$
Once again, additionfromnand{.ref} can be translated into a NAND circuit.
The crucial observation is that the "if/then" statement simply corresponds to
Just like we did for Boolean circuits, we can define a programming-language analog of NAND circuits. It is even simpler than the AON-CIRC language since we only have a single operation. We define the NAND-CIRC Programming Language to be a programming language where every line (apart from the input/output declaration) has the following form:
foo = NAND(bar,blah)
where foo
, bar
and blah
are variable identifiers.
::: {.example title="Our first NAND-CIRC program" #NANDprogramexample} Here is an example of a NAND-CIRC program:
u = NAND(X[0],X[1])
v = NAND(X[0],u)
w = NAND(X[1],u)
Y[0] = NAND(v,w)
:::
Do you know what function this program computes? Hint: you have seen it before.
Formally, just like we did in AONcircdef{.ref} for AON-CIRC, we can define the notion of computation by a NAND-CIRC program in the natural way:
::: {.definition title="Computing by a NAND-CIRC program" #NANDcomp}
Let
-
$P$ has$n$ input variablesX[
$0$]$ ,\ldots,$X[$ n-1$]</code> and $m$ output variables <code>Y[$ 0$]</code>,$\ldots$,<code>Y[$ m-1$]
. -
For every
$x\in {0,1}^n$ , if we execute$P$ when we assign toX[
$0$]$ ,\ldots,$X[$ n-1$]</code> the values $x_0,\ldots,x_{n-1}$, then at the end of the execution, the output variables <code>Y[$ 0$]</code>,$\ldots$,<code>Y[$ m-1$]
have the values$y_0,\ldots,y_{m-1}$ where$y=f(x)$ . :::
As before we can show that NAND circuits are equivalent to NAND-CIRC programs (see progandcircfig{.ref}):
For every
We omit the proof of NANDcircslequivthm{.ref} since it follows along exactly the same lines as the equivalence of Boolean circuits and AON-CIRC program (slcircuitequivthm{.ref}).
Given NANDcircslequivthm{.ref} and NANDuniversamthm{.ref}, we know that we can translate every foo = AND(bar,blah)
, foo = OR(bar,blah)
or foo = NOT(bar)
with the equivalent 1-3 lines that use the NAND
operation.
Our GitHub repository contains a "proof by code": a simple Python program AON2NAND
that transforms an AON-CIRC into an equivalent NAND-CIRC program.
You might have heard of a term called "Turing Complete" that is sometimes used to describe programming languages. (If you haven't, feel free to ignore the rest of this remark: we define this term precisely in chapequivalentmodels{.ref}.)
If so, you might wonder if the NAND-CIRC programming language has this property.
The answer is no, or perhaps more accurately, the term "Turing Completeness" is not really applicable for the NAND-CIRC programming language.
The reason is that, by design, the NAND-CIRC programming language can only compute finite functions
If we put together slcircuitequivthm{.ref}, NANDuniversamthm{.ref}, and NANDcircslequivthm{.ref}, we obtain the following result:
::: {.theorem title="Equivalence between models of finite computation" #equivalencemodelsthm}
For every sufficiently large
-
$f$ can be computed by a Boolean circuit (with$\wedge,\vee,\neg$ gates) of at most$O(s)$ gates. -
$f$ can be computed by an AON-CIRC straight-line program of at most$O(s)$ lines. -
$f$ can be computed by a NAND circuit of at most$O(s)$ gates. -
$f$ can be computed by a NAND-CIRC straight-line program of at most$O(s)$ lines. :::
By "$O(s)$" we mean that the bound is at most
We omit the formal proof, which is obtained by combining slcircuitequivthm{.ref}, NANDuniversamthm{.ref}, and NANDcircslequivthm{.ref}. The key observation is that the results we have seen allow us to translate a program/circuit that computes
slcircuitequivthm{.ref} is a special case of a more general result. We can consider even more general models of computation, where instead of AND/OR/NOT or NAND, we use other operations (see othergatessec{.ref} below). It turns out that Boolean circuits are equivalent in power to such models as well. The fact that all these different ways to define computation lead to equivalent models shows that we are "on the right track". It justifies the seemingly arbitrary choices that we've made of using AND/OR/NOT or NAND as our basic operations, since these choices do not affect the power of our computational model. Equivalence results such as equivalencemodelsthm{.ref} mean that we can easily translate between Boolean circuits, NAND circuits, NAND-CIRC programs and the like. We will use this ability later on in this book, often shifting to the most convenient formulation without making a big deal about it. Hence we will not worry too much about the distinction between, for example, Boolean circuits and NAND-CIRC programs.
In contrast, we will continue to take special care to distinguish between circuits/programs and functions (recall functionprogramidea{.ref}). A function corresponds to a specification of a computational task, and it is a fundamentally different object than a program or a circuit, which corresponds to the implementation of the task.
There is nothing special about AND/OR/NOT or NAND. For every set of functions foo
the result of applying some
::: {.definition title="General straight-line programs" #genstraight-lineprogs}
Let X[
$i]
to denote the input and output variables.
We say that
AON-CIRC programs correspond to
We can also define $\mathcal{F}$ circuits, which will be directed graphs in which each gate corresponds to applying a function
::: {.example title="IF,ZERO,ONE circuits" #IZOcircuits}
Let
Indeed, we can demonstrate that
$$ NAND(a,b) = IF(a,IF(b,ZERO,ONE),ONE) ;. $$ :::
There are also some sets
As we discussed in secimplvsspec{.ref}, one of the most important distinctions in this book is that of specification versus implementation or separating "what" from "how" (see specvsimplfig{.ref}). A function corresponds to the specification of a computational task, that is what output should be produced for every particular input. A program (or circuit, or any other way to specify algorithms) corresponds to the implementation of how to compute the desired output from the input. That is, a program is a set of instructions on how to compute the output from the input. Even within the same computational model there can be many different ways to compute the same function. For example, there is more than one NAND-CIRC program that computes the majority function, more than one Boolean circuit to compute the addition function, and so on and so forth.
Confusing specification and implementation (or equivalently functions and programs) is a common mistake, and one that is unfortunately encouraged by the common programming-language terminology of referring to parts of programs as "functions". However, in both the theory and practice of computer science, it is important to maintain this distinction, and it is particularly important for us in this book.
- An algorithm is a recipe for performing a computation as a sequence of "elementary" or "simple" operations.
- One candidate definition for "elementary" operations is the set
$AND$ ,$OR$ and$NOT$ . - Another candidate definition for an "elementary" operation is the
$NAND$ operation. It is an operation that is easily implementable in the physical world in a variety of methods including by electronic transistors. - We can use
$NAND$ to compute many other functions, including majority, increment, and others. - There are other equivalent choices, including the sets
${AND,OR,NOT}$ and${ IF, ZERO, ONE }$ . - We can formally define the notion of a function
$F:{0,1}^n \rightarrow {0,1}^m$ being computable using the NAND-CIRC Programming language. - For every set of basic operations, the notions of being computable by a circuit and being computable by a straight-line program are equivalent.
::: {.exercise title="Compare
::: {.exercise title="Compare
::: {.exercise title="OR,NOT is universal" #ornotex}
Prove that the set
::: {.exercise title="AND,OR is not universal" #andorex}
Prove that for every
Conclude that the set
::: {.exercise title="XOR is not universal" #xorex}
Prove that for every
Conclude that the set
::: {.exercise title="MAJ,NOT, 1 is universal" #majnotex}
Let
::: {.exercise title="MAJ,NOT is not universal" #majnotextwo}
Prove that
::: {.exercise title="NOR is universal" #norex}
Let
::: {.exercise title="Lookup is universal" #lookupex}
Prove that
Prove that for every subset
::: {.exercise title="Size and inputs / outputs" #nandcircsizeex}
Prove that for every NAND circuit of size
Prove that there is some constant
::: {.exercise title="NANDs from activation functions" #NANDsfromActivationfunctionex}
We say that a function
In this exercise you will show that you can construct a NAND approximator from many common activation functions used in deep neural networks. As a corollary you will obtain that deep neural networks can simulate NAND circuits. Since NAND circuits can also simulate deep neural networks, these two computational models are equivalent to one another.
-
Show that there is a NAND approximator
$f$ defined as$f(a,b) = L(DReLU(L'(a,b)))$ where$L':\mathbb{R}^2 \rightarrow \mathbb{R}$ is an affine function (of the form$L'(a,b)=\alpha a + \beta b + \gamma$ for some$\alpha,\beta,\gamma \in \mathbb{R}$ ),$L$ is an affine function (of the form$L(y) = \alpha y + \beta$ for$\alpha,\beta \in \mathbb{R}$ ), and$DReLU:\mathbb{R} \rightarrow \mathbb{R}$ , is the function defined as$DReLU(x) = \min(1,\max(0,x))$ . Note that$DReLU(x) = 1-ReLU(1-ReLU(x))$ where$ReLU(x)=\max(x,0)$ is the rectified linear unit activation function. -
Show that there is a NAND approximator
$f$ defined as$f(a,b) = L(sigmoid(L'(a,b)))$ where$L',L$ are affine as above and$sigmoid:\mathbb{R} \rightarrow \mathbb{R}$ is the function defined as$sigmoid(x) = e^x/(e^x+1)$ . -
Show that there is a NAND approximator
$f$ defined as$f(a,b) = L(tanh(L'(a,b)))$ where$L',L$ are affine as above and$tanh:\mathbb{R} \rightarrow \mathbb{R}$ is the function defined as$tanh(x) = (e^x-e^{-x})/(e^x+e^{-x})$ . -
Prove that for every NAND-circuit
$C$ with$n$ inputs and one output that computes a function$g:{0,1}^n \rightarrow {0,1}$ , if we replace every gate of$C$ with a NAND-approximator and then invoke the resulting circuit on some$x\in {0,1}^n$ , the output will be a number$y$ such that$|y-g(x)|\leq 1/3$ . :::
::: {.exercise title="Majority with NANDs efficiently" #majwithNAND}
Prove that there is some constant
::: {.exercise title="Output at last layer" #outputlastlayer}
Prove that for every
The excerpt from Al-Khwarizmi's book is from "The Algebra of Ben-Musa", Fredric Rosen, 1831.
Charles Babbage (1791-1871) was a visionary scientist, mathematician, and inventor (see [@swade2002the, @collier2000charles]). More than a century before the invention of modern electronic computers, Babbage realized that computation can be in principle mechanized. His first design for a mechanical computer was the difference engine that was designed to do polynomial interpolation. He then designed the analytical engine which was a much more general machine and the first prototype for a programmable general-purpose computer. Unfortunately, Babbage was never able to complete the design of his prototypes. One of the earliest people to realize the engine's potential and far-reaching implications was Ada Lovelace (see the notes for chaploops{.ref}).
Boolean algebra was first investigated by Boole and DeMorgan in the 1840's [@Boole1847mathematical, @DeMorgan1847]. The definition of Boolean circuits and connection to electrical relay circuits was given in Shannon's Masters Thesis [@Shannon1938]. (Howard Gardener called Shannon's thesis "possibly the most important, and also the most famous, master's thesis of the [20th] century".) Savage's book [@Savage1998models], like this one, introduces the theory of computation starting with Boolean circuits as the first model. Jukna's book [@Jukna12] contains a modern in-depth exposition of Boolean circuits, see also [@wegener1987complexity].
The NAND function was shown to be universal by Sheffer [@Sheffer1913], though this also appears in the earlier work of Peirce, see [@Burks1978charles]. Whitehead and Russell used NAND as the basis for their logic in their magnum opus Principia Mathematica [@WhiteheadRussell1912]. In her Ph.D thesis, Ernst [@Ernst2009phd] investigates empirically the minimal NAND circuits for various functions. Nisan and Shocken's book [@NisanShocken2005] builds a computing system starting from NAND gates and ending with high-level programs and games ("NAND to Tetris"); see also the website nandtotetris.org.
We defined the size of a Boolean circuit in booleancircdef{.ref} to be the number of gates it contains. This is one of two conventions used in the literature. The other convention is to define the size as the number of wires (equivalent to the number of gates plus the number of inputs). This makes very little difference in almost all settings, but can affect the circuit size complexity of some "pathological examples" of functions such as the constant zero function that do not depend on much of their inputs.