Skip to content

cosinusoidally/tcc_bootstrap_alt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The aim here is to provide an alternative bootstrap path for
https://github.com/cosinusoidally/mishmashvm/ to allow us to bootstrap tcc
without relying on an Emscripten compiled JS version of tcc. This bootstrap
method goes from an cut down tcc 0.9.2 (circa 2001) all the way up to tcc 0.9.27
(the most recent official release from 2017). The whole bootstrap process takes
a couple of seconds on very modest hardware.


*Update* this works now. You can bootstrap from nothing up to tcc 0.2.97 by
using the following command:

./mk_from_bootstrap_seed &> /dev/shm/log

(we redirect stdout/stderr to /dev/shm/log since the bootstrap process is quite
verbose)

For a full description of the bootstrap process see dev_notes/steps.txt

Once the bootstrap is complete then you will have a statically linked version of
tcc in ./artifacts/tcc_27_boot_static.exe . Note this version has a very cut
down libc polyfill so you may struggle to use the binary to build other things.
It is more general purpose if you link it to glibc (eg if you run ./mk_min and
have an existing copy of tcc in your path).

tcc_bootstrap_alt can also be used as an alternative to mescc in live-bootstrap.
See fosslinux/live-bootstrap#407 (not yet merged). This
cuts a significant amount of time off the early stages of live-bootstrap (since
mescc is quite slow). See also:
https://github.com/cosinusoidally/live-bootstrap/tree/tcc_bootstrap_alt-refactor

If you have a copy of mishmashvm in a directory alongside this one then you can
boostrap mishmashvm fully using:

./mk_mmvm_static &> /dev/shm/log

For mishmashvm setup instructions see "Setup" section below.

You can also bootstrap from mishmashvm by changing into mishmashvm/auxiliary
(within your mishmashvm checkout) and running:

./bootstrap_alt4.sh

There's currently some slowness in ./mk_from_bootstrap_seed (takes about 1 min
on modest hardware when it should really take around 10s). Not a major problem
but this should be sorted out. The main issue is unbuffered I/O. The version
linked against glibc does not suffer from this issue.

As you can see there are multiple scripts in this directory. Most of these are
for various test/dev modes. Not all are documented but some are documented below
in the original README below.

Original README (left here since I still need to update it):

Setup:
======

Follow the Linux setup instructions from
https://github.com/cosinusoidally/mishmashvm/ in order to set up an Ubuntu
bionic chroot, but debootstrap a buildd rather than a minbase:

sudo debootstrap --arch=i386 --variant=buildd bionic bionic

Make sure you bind mount /dev /dev/shm and /proc as per the mishmashvm
instructions as these are needed in order to allow the Mozilla Spidermonkey
shell to run. Note Spidermonkey is not a hard requirement to bootstrap tcc, it
is only needed to test the full end to end bootstrap from tcc 0.9.2 to 0.9.27 to
the mishmashvm variant of tcc 0.9.27.

When entering the chroot make sure you do setarch i686 if you are on a amd64
kernel, otherwise tcc will misbehave and try to build 64 bit binaries.

cd bionic
sudo chroot .
su foo (set up this user so we are not running as root)
setarch i686
bash

make yourself a directory to check out tcc_bootstrap_alt just to keep things
separate (the build process will touch the parent diretory of tcc_bootstrap_alt
).

from tcc_bootstrap_alt

cp -r tcc_27 ../
cd ../tcc_27
./configure --prefix=/tmp/tcc
make
make install
cd ../tcc_bootstrap_alt/
pushd .
cd /tmp/tcc/bin/
export PATH=$PWD:$PATH
popd

At this point you will be in tcc_bootstrap_alt . There are several alternative
bootstrap scripts:

./mk_min  (bootstraps from a cut down version of v2 (called tcc_1_7))
./mk_2m   ("minimal v2" this bootstraps from an integer only version of 0.9.2,
         this is the preferred starting point)
./mk_2    (bootstraps from a more complete version of v2)
./mk_3    (bootstraps from 0.9.3)
./mk_mmvm (bootstraps from v2m up to v27 and then bootstraps the mishmashvm
           version of tcc. It expects mishmashvm to be in ../mishmashvm . You
           must also have the Mozilla Spidermonkey JS VM in your PATH)
./mk_clean (cleans up build artifacts)

Just a note on notation, you'll see many tcc_X directories, these directories
correspond to tcc 0.9.X just for briefness (pretty redundant saying 0.9.X when
tcc has been on 0.9.x for 20+ years). Note that the tcc_1_X directories are
actually cut down versions of tcc_2 since the real tcc_1 cannot actually build
tcc_2.

There are also serveral partial bootstrap chains that bootstrap up to certain
points and then verify that the hashes of the binaries that are generated match
the hashes of the binaries generated by the full bootstrap path:

./mk_tcc_js uses a js port of tcc_1_7 to generate a tcc_boot.o binary that
            matches the binary generated by ./mk_dlsym
./mk_dlsym  builds a version of tcc_1_7 without a dependency on dlsym (this is
            needed to simplify the bootstrap process when the bootstrap is
            driven by the bootstrap scripts in mishmashvm). Also instructs
            tcc_1_7 to output a binary (tcc_boot.o - which is in an adhoc
            executable format)
./mk_loader boostraps by running tcc_boot.o using a loader for the adhoc
            executable format.

To bootstrap:
=============

./mk_2m
or
./mk_min


If this is successful you will see:

tcc version 0.9.27 (i386 Linux)
5175a1b6e0c00caa211b1a9cb6df674b197c5a27beaa5e04ff456d9ceb0419b3  tcc_27/tcc.o
52d0516842257aed4d9a3e91046bbce9c9eb4edcc267d0f411a9a605f509f053  tcc_27/libtcc1.o
tcc_27/tcc.o: OK
tcc_27/libtcc1.o: OK

This confirms that tcc 0.9.27 builds, runs, and that the hashes of the object
code are as expected.

The compiler itself is in artifacts/tcc_27_boot.exe . Note that is a Linux
executable, but it has the extension "exe" to make it easier to gitignore. Also
note that it can build ".o" files but it will probably struggle to locate stuff
like crt*.o if you try to use it to build Linux executables (you'll likely end
up needing to use -nostdlib and then manually specifying everything). Further
note this is not a major issue for my purposes as mishmashvm directly loads and
runs ".o" files. Even further note that it will probably also struggle to find
include files (you may need to use -nostdinc and specify manually).

Bootstrapping mishmashvm:
=========================

Make sure you have checked out mishmashvm in the parent directory:

$ ls
jsshell  mishmashvm  mmvm_cfg.json  tcc_27  tcc_bootstrap_alt

jump into tcc_bootstrap_alt and run ./mk_mmvm

cd tcc_bootstrap_alt
./mk_mmvm

If this is successful you will see:

....

0e8e02c852a11ebec585650d3c6fa5917fb51d7b66dd1d458d3e51680c660964 my_libc.o
0e8e02c852a11ebec585650d3c6fa5917fb51d7b66dd1d458d3e51680c660964 my_libc.o.new
0cdb519efb42f22eba96bf90407e10b87868355e1d793d3a70271f09d47aa403 stubs.o
0cdb519efb42f22eba96bf90407e10b87868355e1d793d3a70271f09d47aa403 stubs.o.new
59483d03266a9eadb84ceafaf4ed8a37e5a5231aaf773f296a7ca097679307b3 tcc_bin/libtcc1.o
59483d03266a9eadb84ceafaf4ed8a37e5a5231aaf773f296a7ca097679307b3 tcc_bin/libtcc1.o.new
b64ff3010f2de6eb50762169fc5309b66ef704924cfc21648e7b75f088af3365 tcc_bin/tcc_boot3.o
b64ff3010f2de6eb50762169fc5309b66ef704924cfc21648e7b75f088af3365 tcc_bin/tcc_boot3.o.new
../mishmashvm/libc_portable_proto/my_libc.o: OK
../mishmashvm/libc_portable_proto/stubs.o: OK
../mishmashvm/libc_portable_proto/tcc_bin/libtcc1.o: OK
../mishmashvm/libc_portable_proto/tcc_bin/tcc_boot3.o: OK

tcc_boot3.o is the mishmashvm version of tcc. This bootstrap method produces a
version of tcc which is bit identical to the version created by the Emscripten
compiled JS based version of tcc in (mishmashvm/tcc_js_bootstrap/tcc_em.js).

Technical Details:
==================

There are a couple of details that make this possible:

* use of debian woody glibc headers. These are very old since older versions of
  tcc cannot parse newer glibc headers
* rely on tcc 0.9.27 to do the linking. Earlier version of tcc will struggle to
  correctly link things on a Bionic system. Generally the logic for locating
  things like crt files and shared libraries is also fragile. Very early
  versions of tcc cannot even create executables as they are pure C JIT
  compilers. To avoid these problems we simply use a copy of v27 built on the
  host system in order to do the linking.
* start with a cut down integer only version of v2. This is with the eventual
  aim of starting the bootstrap process from a simpler C compiler like M2-Planet
  from https://github.com/oriansj/m2-planet and using
  https://github.com/oriansj/stage0-posix-x86 to start the bootstrap. There is
  code in mishmashvm that is actually able to build M2-Planet under a JS vm:
  https://github.com/cosinusoidally/mishmashvm/tree/master/tcc_js_bootstrap/alt_bootstrap
  Alternatively it may be possible to further cut down v2m and then port it to a
  simple subset of JS, maybe even a subset of JS that is also valid C. eg:

  #define var int
  #define function int
  #include "foo.js"

  foo.js:

  function foo(){
    return 7;
  }

  The above program is valid JS and valid C. There are lots of caveats to this
  approach but it may be viable.

* note very early versions of tcc assume malloced memory is executable. If Linux
  processes are enforcing the No Execute bit on pages (NX) then v2 will crash.
  You'll notice this happens if you compile v2m using gcc. Not 100% sure why
  this happens with gcc but not tcc, but might be to do with the fact v27 of tcc
  does not use symbol versioning. The lack of symbol versioning may implicitly
  be making glibc more lax about enforcement of NX (though this is just a
  hypothesis). v3 and above will use mmap and correctly set executable
  permissions on the allocated memory, this could be backported to 2.

  Note this issue has been fixed by using mmap in the early versions of tcc

* Note the versions are approximate using checkouts from the tcc git repo on:
  https://repo.or.cz/tinycc.git/ beyond v18 I am using the git tags which should
  be closer to the actual releases.

Future plans:
=============

The ultimate end goal is to replace tcc_em.js in mishmashvm. In order to do this
the alt bootstrap path must have parity with tcc_em.js:

* Only depend on a stock Spidermonkey binary (DONE)
* works on win32 and Linux (DONE)
* bootstrap process must take a similar amount of time or less (DONE)
* works under the self hosted mishmashvm version of the duktape js vm (DONE)
* works with a stock node.js DONE
* works under a stock Firefox binary

Initially a hard dependency on a JS VM is fine. At this point the main aim is to
eliminate the need for tcc_em.js. At a later date I may extend the bootstrap to
also bootstrap a JS VM (maybe something like mujs, which can probably be cut
down to compile with a very simple C compiler).

How will I get v2m to run under a JS VM? There are a couple of potential
options:

* (DONE) Manually port v2m to JS. This is doable but may be time consuming. The
  general idea would be to cut down v2m as much as possible and then manually
  port probably 4k to 5k lines of code from C to JS. See README in tcc_js for
  details.
* Modify the v2m backend to also emit JS. Kind of like a poor mans Emscripten.
  I would then manually tidy up the output.
* Compile with Emscripten and extensively tidy up the output. This is doable,
  but may be very tedious.

I may also need to modify v2m in order to emit some kind of object code format
(DONE see README in tcc_js directory).

Reduce the number of versions of tcc. I've already removed v19, it should be
possible to get rid of more versions, eg I have a hacked up version that can go
from v3 to v23 (which eliminated v10). This reduces the number of versions of
tcc that I will need to maintain. Currently the bootstrap goes:

tcc_js->1_7->1_8->1_9->2m->2->3->10->23->24->26->27

As mentioned removing v10 should be fairly straight forward. v23 to v24 may be
more tricky. My guess is that v24 is probably the one to try and eliminate, but
there are many changes between the two versions. v27 probably can also be
eliminated as I should be able to use the v27 from mishmashvm.