-
Notifications
You must be signed in to change notification settings - Fork 1
cosinusoidally/tcc_bootstrap_alt
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The aim here is to provide an alternative bootstrap path for https://github.com/cosinusoidally/mishmashvm/ to allow us to bootstrap tcc without relying on an Emscripten compiled JS version of tcc. This bootstrap method goes from an cut down tcc 0.9.2 (circa 2001) all the way up to tcc 0.9.27 (the most recent official release from 2017). The whole bootstrap process takes a couple of seconds on very modest hardware. *Update* this works now. You can bootstrap from nothing up to tcc 0.2.97 by using the following command: ./mk_from_bootstrap_seed &> /dev/shm/log (we redirect stdout/stderr to /dev/shm/log since the bootstrap process is quite verbose) For a full description of the bootstrap process see dev_notes/steps.txt Once the bootstrap is complete then you will have a statically linked version of tcc in ./artifacts/tcc_27_boot_static.exe . Note this version has a very cut down libc polyfill so you may struggle to use the binary to build other things. It is more general purpose if you link it to glibc (eg if you run ./mk_min and have an existing copy of tcc in your path). tcc_bootstrap_alt can also be used as an alternative to mescc in live-bootstrap. See fosslinux/live-bootstrap#407 (not yet merged). This cuts a significant amount of time off the early stages of live-bootstrap (since mescc is quite slow). See also: https://github.com/cosinusoidally/live-bootstrap/tree/tcc_bootstrap_alt-refactor If you have a copy of mishmashvm in a directory alongside this one then you can boostrap mishmashvm fully using: ./mk_mmvm_static &> /dev/shm/log For mishmashvm setup instructions see "Setup" section below. You can also bootstrap from mishmashvm by changing into mishmashvm/auxiliary (within your mishmashvm checkout) and running: ./bootstrap_alt4.sh There's currently some slowness in ./mk_from_bootstrap_seed (takes about 1 min on modest hardware when it should really take around 10s). Not a major problem but this should be sorted out. The main issue is unbuffered I/O. The version linked against glibc does not suffer from this issue. As you can see there are multiple scripts in this directory. Most of these are for various test/dev modes. Not all are documented but some are documented below in the original README below. Original README (left here since I still need to update it): Setup: ====== Follow the Linux setup instructions from https://github.com/cosinusoidally/mishmashvm/ in order to set up an Ubuntu bionic chroot, but debootstrap a buildd rather than a minbase: sudo debootstrap --arch=i386 --variant=buildd bionic bionic Make sure you bind mount /dev /dev/shm and /proc as per the mishmashvm instructions as these are needed in order to allow the Mozilla Spidermonkey shell to run. Note Spidermonkey is not a hard requirement to bootstrap tcc, it is only needed to test the full end to end bootstrap from tcc 0.9.2 to 0.9.27 to the mishmashvm variant of tcc 0.9.27. When entering the chroot make sure you do setarch i686 if you are on a amd64 kernel, otherwise tcc will misbehave and try to build 64 bit binaries. cd bionic sudo chroot . su foo (set up this user so we are not running as root) setarch i686 bash make yourself a directory to check out tcc_bootstrap_alt just to keep things separate (the build process will touch the parent diretory of tcc_bootstrap_alt ). from tcc_bootstrap_alt cp -r tcc_27 ../ cd ../tcc_27 ./configure --prefix=/tmp/tcc make make install cd ../tcc_bootstrap_alt/ pushd . cd /tmp/tcc/bin/ export PATH=$PWD:$PATH popd At this point you will be in tcc_bootstrap_alt . There are several alternative bootstrap scripts: ./mk_min (bootstraps from a cut down version of v2 (called tcc_1_7)) ./mk_2m ("minimal v2" this bootstraps from an integer only version of 0.9.2, this is the preferred starting point) ./mk_2 (bootstraps from a more complete version of v2) ./mk_3 (bootstraps from 0.9.3) ./mk_mmvm (bootstraps from v2m up to v27 and then bootstraps the mishmashvm version of tcc. It expects mishmashvm to be in ../mishmashvm . You must also have the Mozilla Spidermonkey JS VM in your PATH) ./mk_clean (cleans up build artifacts) Just a note on notation, you'll see many tcc_X directories, these directories correspond to tcc 0.9.X just for briefness (pretty redundant saying 0.9.X when tcc has been on 0.9.x for 20+ years). Note that the tcc_1_X directories are actually cut down versions of tcc_2 since the real tcc_1 cannot actually build tcc_2. There are also serveral partial bootstrap chains that bootstrap up to certain points and then verify that the hashes of the binaries that are generated match the hashes of the binaries generated by the full bootstrap path: ./mk_tcc_js uses a js port of tcc_1_7 to generate a tcc_boot.o binary that matches the binary generated by ./mk_dlsym ./mk_dlsym builds a version of tcc_1_7 without a dependency on dlsym (this is needed to simplify the bootstrap process when the bootstrap is driven by the bootstrap scripts in mishmashvm). Also instructs tcc_1_7 to output a binary (tcc_boot.o - which is in an adhoc executable format) ./mk_loader boostraps by running tcc_boot.o using a loader for the adhoc executable format. To bootstrap: ============= ./mk_2m or ./mk_min If this is successful you will see: tcc version 0.9.27 (i386 Linux) 5175a1b6e0c00caa211b1a9cb6df674b197c5a27beaa5e04ff456d9ceb0419b3 tcc_27/tcc.o 52d0516842257aed4d9a3e91046bbce9c9eb4edcc267d0f411a9a605f509f053 tcc_27/libtcc1.o tcc_27/tcc.o: OK tcc_27/libtcc1.o: OK This confirms that tcc 0.9.27 builds, runs, and that the hashes of the object code are as expected. The compiler itself is in artifacts/tcc_27_boot.exe . Note that is a Linux executable, but it has the extension "exe" to make it easier to gitignore. Also note that it can build ".o" files but it will probably struggle to locate stuff like crt*.o if you try to use it to build Linux executables (you'll likely end up needing to use -nostdlib and then manually specifying everything). Further note this is not a major issue for my purposes as mishmashvm directly loads and runs ".o" files. Even further note that it will probably also struggle to find include files (you may need to use -nostdinc and specify manually). Bootstrapping mishmashvm: ========================= Make sure you have checked out mishmashvm in the parent directory: $ ls jsshell mishmashvm mmvm_cfg.json tcc_27 tcc_bootstrap_alt jump into tcc_bootstrap_alt and run ./mk_mmvm cd tcc_bootstrap_alt ./mk_mmvm If this is successful you will see: .... 0e8e02c852a11ebec585650d3c6fa5917fb51d7b66dd1d458d3e51680c660964 my_libc.o 0e8e02c852a11ebec585650d3c6fa5917fb51d7b66dd1d458d3e51680c660964 my_libc.o.new 0cdb519efb42f22eba96bf90407e10b87868355e1d793d3a70271f09d47aa403 stubs.o 0cdb519efb42f22eba96bf90407e10b87868355e1d793d3a70271f09d47aa403 stubs.o.new 59483d03266a9eadb84ceafaf4ed8a37e5a5231aaf773f296a7ca097679307b3 tcc_bin/libtcc1.o 59483d03266a9eadb84ceafaf4ed8a37e5a5231aaf773f296a7ca097679307b3 tcc_bin/libtcc1.o.new b64ff3010f2de6eb50762169fc5309b66ef704924cfc21648e7b75f088af3365 tcc_bin/tcc_boot3.o b64ff3010f2de6eb50762169fc5309b66ef704924cfc21648e7b75f088af3365 tcc_bin/tcc_boot3.o.new ../mishmashvm/libc_portable_proto/my_libc.o: OK ../mishmashvm/libc_portable_proto/stubs.o: OK ../mishmashvm/libc_portable_proto/tcc_bin/libtcc1.o: OK ../mishmashvm/libc_portable_proto/tcc_bin/tcc_boot3.o: OK tcc_boot3.o is the mishmashvm version of tcc. This bootstrap method produces a version of tcc which is bit identical to the version created by the Emscripten compiled JS based version of tcc in (mishmashvm/tcc_js_bootstrap/tcc_em.js). Technical Details: ================== There are a couple of details that make this possible: * use of debian woody glibc headers. These are very old since older versions of tcc cannot parse newer glibc headers * rely on tcc 0.9.27 to do the linking. Earlier version of tcc will struggle to correctly link things on a Bionic system. Generally the logic for locating things like crt files and shared libraries is also fragile. Very early versions of tcc cannot even create executables as they are pure C JIT compilers. To avoid these problems we simply use a copy of v27 built on the host system in order to do the linking. * start with a cut down integer only version of v2. This is with the eventual aim of starting the bootstrap process from a simpler C compiler like M2-Planet from https://github.com/oriansj/m2-planet and using https://github.com/oriansj/stage0-posix-x86 to start the bootstrap. There is code in mishmashvm that is actually able to build M2-Planet under a JS vm: https://github.com/cosinusoidally/mishmashvm/tree/master/tcc_js_bootstrap/alt_bootstrap Alternatively it may be possible to further cut down v2m and then port it to a simple subset of JS, maybe even a subset of JS that is also valid C. eg: #define var int #define function int #include "foo.js" foo.js: function foo(){ return 7; } The above program is valid JS and valid C. There are lots of caveats to this approach but it may be viable. * note very early versions of tcc assume malloced memory is executable. If Linux processes are enforcing the No Execute bit on pages (NX) then v2 will crash. You'll notice this happens if you compile v2m using gcc. Not 100% sure why this happens with gcc but not tcc, but might be to do with the fact v27 of tcc does not use symbol versioning. The lack of symbol versioning may implicitly be making glibc more lax about enforcement of NX (though this is just a hypothesis). v3 and above will use mmap and correctly set executable permissions on the allocated memory, this could be backported to 2. Note this issue has been fixed by using mmap in the early versions of tcc * Note the versions are approximate using checkouts from the tcc git repo on: https://repo.or.cz/tinycc.git/ beyond v18 I am using the git tags which should be closer to the actual releases. Future plans: ============= The ultimate end goal is to replace tcc_em.js in mishmashvm. In order to do this the alt bootstrap path must have parity with tcc_em.js: * Only depend on a stock Spidermonkey binary (DONE) * works on win32 and Linux (DONE) * bootstrap process must take a similar amount of time or less (DONE) * works under the self hosted mishmashvm version of the duktape js vm (DONE) * works with a stock node.js DONE * works under a stock Firefox binary Initially a hard dependency on a JS VM is fine. At this point the main aim is to eliminate the need for tcc_em.js. At a later date I may extend the bootstrap to also bootstrap a JS VM (maybe something like mujs, which can probably be cut down to compile with a very simple C compiler). How will I get v2m to run under a JS VM? There are a couple of potential options: * (DONE) Manually port v2m to JS. This is doable but may be time consuming. The general idea would be to cut down v2m as much as possible and then manually port probably 4k to 5k lines of code from C to JS. See README in tcc_js for details. * Modify the v2m backend to also emit JS. Kind of like a poor mans Emscripten. I would then manually tidy up the output. * Compile with Emscripten and extensively tidy up the output. This is doable, but may be very tedious. I may also need to modify v2m in order to emit some kind of object code format (DONE see README in tcc_js directory). Reduce the number of versions of tcc. I've already removed v19, it should be possible to get rid of more versions, eg I have a hacked up version that can go from v3 to v23 (which eliminated v10). This reduces the number of versions of tcc that I will need to maintain. Currently the bootstrap goes: tcc_js->1_7->1_8->1_9->2m->2->3->10->23->24->26->27 As mentioned removing v10 should be fairly straight forward. v23 to v24 may be more tricky. My guess is that v24 is probably the one to try and eliminate, but there are many changes between the two versions. v27 probably can also be eliminated as I should be able to use the v27 from mishmashvm.
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Packages 0
No packages published