compiler/
lib/
scripts/
seed/
README
7.2 KiB
radiance.rv64
1.0 MiB
radiance.rv64.git
65 B
radiance.rv64.ro.data
11.7 KiB
radiance.rv64.rw.data
128.0 KiB
update
5.3 KiB
test/
vim/
.gitignore
353 B
.gitsigners
112 B
LICENSE
1.1 KiB
Makefile
3.0 KiB
README
2.5 KiB
std.lib
1.0 KiB
std.lib.test
252 B
README
Radiance is a self-hosted compiler: the compiler is written in Radiance
and compiles itself. This creates a bootstrapping problem: you need a
working compiler to build the compiler. The solution is the "seed" -- a
trusted, checked-in binary that can compile the current source.
This document describes the workflows for developing the compiler,
updating the seed, and maintaining reproducibility.
CONCEPTS
Seed A known-good compiler binary checked into the repository
(`seed/radiance.rv64`). It can compile the current source
code into a working compiler.
Stage One round of self-compilation. Stage N uses the stage N-1
binary to compile the source. Stage 0 is the seed itself.
Fixed point When two consecutive stages produce bit-for-bit identical
binaries. This proves the compiler faithfully
reproduces itself.
"Dev" binary `bin/radiance.rv64.dev` -- built by `make` from the seed.
This is the working compiler used during development.
It is not checked in.
Breaking A source change that the current seed cannot compile.
change Eg. new syntax, changed calling conventions, removed
features the compiler uses during self-compilation. This
requires generating a new seed.
Compatible A source change that the current seed can still compile.
change Eg. bug fixes, new optimizations, new library code that
the compiler itself doesn't use, or isn't meaningfully
affected by.
FILES
seed/radiance.rv64 Seed binary (RISC-V machine code).
seed/radiance.rv64.ro.data Read-only data section.
seed/radiance.rv64.rw.data Read-write data section.
seed/radiance.rv64.git SHA-256 of the git commit whose *source*
was compiled to produce this seed.
seed/update Tool that finds the fixed point and
updates the seed.
EVERYDAY DEVELOPMENT
Most compiler work -- bug fixes, optimizations, new standard library
features, new backends -- does not require a seed update. The workflow is
simply:
1. Edit source code
2. Build the dev binary (produces a new `bin/radiance.rv64.dev`)
make
3. Run tests
make test
4. Commit source changes only. The seed is untouched.
The `dev` binary is ephemeral and rebuilt from the seed on every `make`. As
long as the seed can compile the current source, no seed update is needed.
WHEN TO UPDATE THE SEED
Some compiler work requires an update to the seed.
* When a breaking change is introduced, i.e. a change that breaks the
seed's ability to compile the source. You must update the seed *before*
committing the breaking change. See "Breaking changes" below.
* You want the benefits of compiler improvements (better code generation,
faster compilation) to apply to the build itself. This is optional
but often a good idea.
* The fixed-point property needs re-verification after significant
changes. Even compatible changes can alter the output binary, and
reaching a fixed point confirms the compiler is self-consistent and
deterministic.
Do *not* update the seed casually. Each seed update adds a large binary
diff to the repository.
BREAKING CHANGES
A breaking change is one where the new source cannot be compiled by the
old seed. Examples: new syntax the compiler uses on itself, changed
data structures in the AST, removed intrinsics.
The fundamental constraint is:
The checked-in seed must always be able to compile the
checked-in source.
This means you cannot simply commit a breaking change and update the
seed afterward, there would be a commit where the seed cannot build
the source. Instead:
1. Add support for the new feature to the source, but don't use it
in the compiler's own source yet. Ensure old syntax/behavior
still works.
2. Run `seed/update` to produce a new seed that understands the
new feature.
3. Commit source + updated seed together.
The seed now understands the new feature. From here, switching the
compiler's own source to use it is just a compatible change: the
seed can already compile it. No further seed update is required.
HOW UPDATING THE SEED WORKS
seed/update [--seed <path>]
1. Stage 1: Runs the seed to compile the current source.
Outputs `seed/radiance.rv64.s1`.
2. Compares the SEED with S1. If identical, done (fixed point reached).
3. Stage 2: Runs S1 to compile the source. Outputs S2.
4. Compares S1 and S2. If identical, done.
5. Continues up to a certain number of stages. Fails if no fixed point is reached.
When a fixed point is found, it copies the converged binary to
`seed/radiance.rv64` and writes the current HEAD in `seed/radiance.rv64.git`.
Why might it take multiple stages?
* Stage 1 differs from seed: The source changed, so the compiler
binary changed. Normal.
* Stage 2 differs from Stage 1: The source changes affected how the
compiler generates code for itself. The S1 compiler (built by the
old seed) generates slightly different code than the S2 compiler
(built by S1, which incorporates the changes). Usually converges
at Stage 2 or 3.
* No convergence after 3+ stages: Something is non-deterministic in
code generation (memory addresses leaking into output, hash map
iteration order, etc.). This is a bug that must be fixed.
VERIFYING THE SEED
The seed is an opaque binary checked into the repository. Since binaries
can't be reviewed like source code, trust relies on reproducibility: anyone
can rebuild the seed from source and verify it matches.
Verify the fixed-point property
Run `seed/update`. If the seed is already at a fixed point, Stage 1
will report IDENTICAL immediately. This confirms that compiling the
current source with the seed produces the seed itself -- the compiler
faithfully reproduces its own binary.
Verify from an independent build
If you have a separately-obtained Radiance compiler (e.g. built from
a different trusted seed, or received from another party), use it as
the starting point:
seed/update --seed /path/to/trusted/radiance.rv64
If this converges to the same fixed point as the checked-in seed,
you have strong evidence that the seed is a faithful product of the
source code and not a tampered binary. A backdoored seed cannot survive
independent compilation.
The bootstrapping compiler can serve as this independent second compiler.
Its source can be audited, and any C99 compiler can be used to compile it.
To use it as seed, pass `--from-s0` like so:
seed/update --from-s0 --seed ./radiance.s0
Verify the source commit
The file `seed/radiance.rv64.git` records which commit's source was
compiled to produce the seed.
TROUBLESHOOTING
"No fixed point reached after N stages"
The compiler output is non-deterministic. Diff the binaries
to find what's changing. Common causes:
* Pointer values or addresses leaking into generated code
* Hash table iteration order affecting output
* Uninitialized memory read during compilation