seed/README 7.2 KiB raw
1
2
3
Radiance is a self-hosted compiler: the compiler is written in Radiance
4
and compiles itself. This creates a bootstrapping problem: you need a
5
working compiler to build the compiler. The solution is the "seed" -- a
6
trusted, checked-in binary that can compile the current source.
7
8
This document describes the workflows for developing the compiler,
9
updating the seed, and maintaining reproducibility.
10
11
12
CONCEPTS
13
14
  Seed            A known-good compiler binary checked into the repository
15
                  (`seed/radiance.rv64`). It can compile the current source
16
                  code into a working compiler.
17
18
  Stage           One round of self-compilation. Stage N uses the stage N-1
19
                  binary to compile the source. Stage 0 is the seed itself.
20
21
  Fixed point     When two consecutive stages produce bit-for-bit identical
22
                  binaries. This proves the compiler faithfully
23
                  reproduces itself.
24
25
  "Dev" binary    `bin/radiance.rv64.dev` -- built by `make` from the seed.
26
                  This is the working compiler used during development.
27
                  It is not checked in.
28
29
  Breaking        A source change that the current seed cannot compile.
30
  change          Eg. new syntax, changed calling conventions, removed
31
                  features the compiler uses during self-compilation. This
32
                  requires generating a new seed.
33
34
  Compatible      A source change that the current seed can still compile.
35
  change          Eg. bug fixes, new optimizations, new library code that
36
                  the compiler itself doesn't use, or isn't meaningfully
37
                  affected by.
38
39
40
FILES
41
42
  seed/radiance.rv64           Seed binary (RISC-V machine code).
43
  seed/radiance.rv64.ro.data   Read-only data section.
44
  seed/radiance.rv64.rw.data   Read-write data section.
45
  seed/radiance.rv64.git       SHA-256 of the git commit whose *source*
46
                               was compiled to produce this seed.
47
  seed/update                  Tool that finds the fixed point and
48
                               updates the seed.
49
50
51
EVERYDAY DEVELOPMENT
52
53
Most compiler work -- bug fixes, optimizations, new standard library
54
features, new backends -- does not require a seed update. The workflow is
55
simply:
56
57
  1. Edit source code
58
  2. Build the dev binary (produces a new `bin/radiance.rv64.dev`)
59
60
      make
61
62
  3. Run tests
63
64
      make test
65
66
  4. Commit source changes only. The seed is untouched.
67
68
The `dev` binary is ephemeral and rebuilt from the seed on every `make`. As
69
long as the seed can compile the current source, no seed update is needed.
70
71
72
WHEN TO UPDATE THE SEED
73
74
Some compiler work requires an update to the seed.
75
76
  * When a breaking change is introduced, i.e. a change that breaks the
77
    seed's ability to compile the source. You must update the seed *before*
78
    committing the breaking change. See "Breaking changes" below.
79
80
  * You want the benefits of compiler improvements (better code generation,
81
    faster compilation) to apply to the build itself. This is optional
82
    but often a good idea.
83
84
  * The fixed-point property needs re-verification after significant
85
    changes. Even compatible changes can alter the output binary, and
86
    reaching a fixed point confirms the compiler is self-consistent and
87
    deterministic.
88
89
Do *not* update the seed casually. Each seed update adds a large binary
90
diff to the repository.
91
92
93
BREAKING CHANGES
94
95
A breaking change is one where the new source cannot be compiled by the
96
old seed. Examples: new syntax the compiler uses on itself, changed
97
data structures in the AST, removed intrinsics.
98
99
The fundamental constraint is:
100
101
    The checked-in seed must always be able to compile the
102
    checked-in source.
103
104
This means you cannot simply commit a breaking change and update the
105
seed afterward, there would be a commit where the seed cannot build
106
the source. Instead:
107
108
    1. Add support for the new feature to the source, but don't use it
109
       in the compiler's own source yet. Ensure old syntax/behavior
110
       still works.
111
112
    2. Run `seed/update` to produce a new seed that understands the
113
       new feature.
114
115
    3. Commit source + updated seed together.
116
117
The seed now understands the new feature. From here, switching the
118
compiler's own source to use it is just a compatible change: the
119
seed can already compile it. No further seed update is required.
120
121
122
HOW UPDATING THE SEED WORKS
123
124
  seed/update [--seed <path>]
125
126
  1. Stage 1: Runs the seed to compile the current source.
127
     Outputs `seed/radiance.rv64.s1`.
128
  2. Compares the SEED with S1. If identical, done (fixed point reached).
129
  3. Stage 2: Runs S1 to compile the source. Outputs S2.
130
  4. Compares S1 and S2. If identical, done.
131
  5. Continues up to a certain number of stages. Fails if no fixed point is reached.
132
133
When a fixed point is found, it copies the converged binary to
134
`seed/radiance.rv64` and writes the current HEAD in `seed/radiance.rv64.git`.
135
136
Why might it take multiple stages?
137
138
  * Stage 1 differs from seed: The source changed, so the compiler
139
    binary changed. Normal.
140
141
  * Stage 2 differs from Stage 1: The source changes affected how the
142
    compiler generates code for itself. The S1 compiler (built by the
143
    old seed) generates slightly different code than the S2 compiler
144
    (built by S1, which incorporates the changes). Usually converges
145
    at Stage 2 or 3.
146
147
  * No convergence after 3+ stages: Something is non-deterministic in
148
    code generation (memory addresses leaking into output, hash map
149
    iteration order, etc.). This is a bug that must be fixed.
150
151
152
VERIFYING THE SEED
153
154
The seed is an opaque binary checked into the repository. Since binaries
155
can't be reviewed like source code, trust relies on reproducibility: anyone
156
can rebuild the seed from source and verify it matches.
157
158
  Verify the fixed-point property
159
160
    Run `seed/update`. If the seed is already at a fixed point, Stage 1
161
    will report IDENTICAL immediately. This confirms that compiling the
162
    current source with the seed produces the seed itself -- the compiler
163
    faithfully reproduces its own binary.
164
165
  Verify from an independent build
166
167
    If you have a separately-obtained Radiance compiler (e.g. built from
168
    a different trusted seed, or received from another party), use it as
169
    the starting point:
170
171
      seed/update --seed /path/to/trusted/radiance.rv64
172
173
    If this converges to the same fixed point as the checked-in seed,
174
    you have strong evidence that the seed is a faithful product of the
175
    source code and not a tampered binary. A backdoored seed cannot survive
176
    independent compilation.
177
178
    The bootstrapping compiler can serve as this independent second compiler.
179
    Its source can be audited, and any C99 compiler can be used to compile it.
180
    To use it as seed, pass `--from-s0` like so:
181
182
      seed/update --from-s0 --seed ./radiance.s0
183
184
  Verify the source commit
185
186
    The file `seed/radiance.rv64.git` records which commit's source was
187
    compiled to produce the seed.
188
189
190
TROUBLESHOOTING
191
192
  "No fixed point reached after N stages"
193
194
    The compiler output is non-deterministic. Diff the binaries
195
    to find what's changing. Common causes:
196
197
    * Pointer values or addresses leaking into generated code
198
    * Hash table iteration order affecting output
199
    * Uninitialized memory read during compilation