<slides>
<title>A Gcc Compiler Server</title>
<slide id="splash">
<caption>A Gcc Compiler Server</caption>
<h2>Per Bothner</h2>
<h2>Apple Computer</h2>
<h3><code>&lt;per@bothner.com&gt;</code><br/><code>&lt;pbothner@apple.com&gt;</code></h3>
<h5>May 2003</h5>
</slide>

<slide id="intro">
<caption>Classic compiler structure</caption>
<ul>
<li>Overall structure of the <code>gcc</code> program
is the same as original K&amp;R C Compiler:
<ul>
<li>
A user-mode program (<code>gcc</code>/<code>cc</code>)
processes arguments, and decides which other programs to run.</li>
<li>
The compiler proper (<code>cc1</code>/...) is invoked once for each
source file.</li>
<li>
Result of <code>cc1</code> is an assembly file,
which is assembled using <code>as</code> program.</li>
</ul></li>
<li>
This tried-and-true approach is running into problems.</li>
</ul>
</slide>

<slide id="intro-slow">
<caption>Slow compilation</caption>
<ul><li>
The classic approach leads to lots of extra work:
<ul>
<li>
Forking a new <code>cc1</code> (and <code>as</code>)
for each source file.</li>
<li>
Initializing <code>cc1</code> internal state (such
as predefined declarations).</li>
<li>
Processing external modules (header files) is
re-done each time.</li>
</ul>
</li>
<li>
The latter is most significant as it leads to
<var>O(N<sup>2</sup>)</var> behavior.</li>
</ul>
</slide>

<slide id="intro-header">
<caption>Re-reading header files</caption>
<ul>
<li>C++ inline functions and templates are typically in headers.</li>
<li>Compilation time is often dominated by header files.
<ul>
<li>Assume a top-level files on average includes <var>N</var> headers.</li>
<li>Then compiling <var>M</var> files has to process <var>M*N</var> headers.</li>
</ul></li>
<li>This motivates pre-compiled header (PCH) files.</li>
<li>A server can give us comparable benefits, if we can <i>re-use</i>
header files.</li>
<li>This might be easier and more flexible than PCH.</li>
</ul>
</slide>

<slide id="intro-optimization">
<caption>Inter-module optimization</caption>
<ul>
<li>The classic approach hurts run-time as well as compile-time.</li>
<li>Compiler has available only information in the current file,
plus included headers.</li>
<li>Compiler cannot make use of information in other modules
until they get linked together.</li>
<li>Specifially cannot inline across modules.</li>
<li>This is why critical information (inline functions and templates)
migrates into headers.</li>
<li>This talk focuses on compile speed, but be aware that it also
enables important optimizations.</li>
</ul>
</slide>

<slide id="caveats">
<caption>This is work-in-progress</caption>
<ul>
<li>This requires substantial changes to <code>gcc</code>.</li>
<li>It almost-works for C.</li>
<li>Some progress on support for C++.</li>
<li>Most of what I say about <code>cc1</code> also applies to other Gcc compilers.</li>
<li>Focus on languages that use <code>cpplib</code>.</li>
<li>Similar
issues arise in any language that imports other modules.</li>
</ul>
</slide>

<slide id="multi-input">
<caption>Multi-input mode</caption>
<ul>
<li>
Read multiple top-level source
files, generate single assembler file.</li>
<li>
Equivalent to multiple compiles plus linking
resulting object files.</li>
<li>
Speeds up compilation, enables inter-module optimizations.</li>
<li>
Needs to re-initialize front-end for each source file,
without re-initializing back-end.</li>
<li>
Both <code>gcc</code> and <code>cc1</code> changes.</li>
<li>
Mostly works, but some issues remain, such as renaming statics.</li>
<li>
<code>gcj</code> has supported this mode for a while in an ad hoc manner.</li>
</ul>
</slide>

<slide id="server-mode">
<caption>Server mode</caption>
<ul>
<li><code>cc1</code> waits for compilation requests.</li>
<li>Listens to Unix domain socket bound to <code>./.cc1-server</code>.</li>
<li>Each request names one or more source files to compile,
and an output assembler file.</li>
<li>Works serially:  When done with one request, waits for next one.</li>
<li>
Speeds up compile-edit-debug cycle.</li>
<li>Speeds up batch compiling of entire directories/projects.</li>
</ul>
</slide>

<slide id="initialization">
<caption>Initialization</caption>
<ul>
<li>Before "real work", compiler has to initialize builtins and data structures.</li>
<li>We now have 3 levels of initialization:
<ul>
<li>Real one-time initialization.</li>
<li>Initializing rtl and assembly generation, for each output file.</li>
<li>Initializing pre-defined macros and identifiers, for each top-level input file.</li>
</ul>
</li>
<li>Existing code tangles these together with needless interdependencies.</li>
</ul>
</slide>

<slide id="reuse-data">
<caption>Re-using text, tokens, or trees?</caption>
<ul>
<li>When processing a header file, we want to remember what
we read so it's faster the next time.</li>
<li>
We have a choice between:
<ul>
<li>Saving the text in the buffer.  Simple, low-overhead,
but doesn't buy much.</li>
<li>
Saving the tokens in the buffer, either before or after
preprocessing.  Requires new memory intensive data structure.</li>
<li>
Saving the semantic data resulting from the header files - <i>i.e.</i> trees.
</li>
</ul>
</li>
<li>
The latter gives us the biggest potential pay-off, so that is what
we do.</li>
</ul>
</slide>

<slide id="dependencies">
<caption>Dependencies and invalidation vs re-use</caption>
<ul>
<li>
A header file provides (exports) various <dfn>declarations</dfn>,
including macros, types, external declarations, and inline functions.</li>
<li>
Goal: When including a file that has been processed before,
just <em>re-use</em> declaration nodes from last time.</li>
<li>Then we can just skip the header.</li>
<li>Complication: A definition may depend on other definitions.</li>
<li>Must check that these have not changed.</li>
</ul>
</slide>

<slide id="invalidation">
<caption>Inconsistent header file use</caption>
<ul><li><code>inc.h</code>:
<pre>
struct device { dev_t index; };
</pre>
</li></ul>
<ul><li><code>a.c</code>:
<pre>
#define dev_t int
#include "inc.h"
</pre>
</li></ul>
<ul><li><code>b.c</code>:
<pre>
typedef short dev_t;
#include "inc.h"
</pre>
</li></ul>
</slide>

<slide id="one-definition">
<caption>Extended one-definition rule</caption>
<ul>
<li>
Such inconsistencies are rare, but may happen.</li>
<li>
C++'s <dfn>one-definition rule</dfn> requires that if two
compilation units see definitions of the "same" name,
they must be token-by-token equivalent.</li>
<li>
"Extended one-definition rule":<br />
In a "well-behaved program" -
<ul>
<li>a shared definition is defined in a single
location in a header file;</li>
<li>
the "meaning" of that definition does not change between
compilation units.</li>
</ul></li>
<li>
We optimize for this assumption, but must tolerate violations.</li>
<li>Bonus: Can warn about violations, which are probably unintended.</li>
</ul>
</slide>

<slide id="fragment">
<caption>Fragments</caption>
<ul>
<li>Checking if we can re-use a header file
is complicated by conditional compilation and nested includes.</li>
<li>Re-use and dependency checking is simplified by using smaller
units.</li>
<li>
Unit  of re-use is a <dfn>fragment</dfn> between cpp directives.</li>
<li>
Makes <code>cpplib</code> changes modest.</li>
</ul>
</slide>

<slide id="fragment-logic">
<caption>cpplib fragment handling</caption>
<ul>
<li>Use <code>cpplib</code>'s (existing but disabled) include file cache.</li>
<li>Also remember chain of fragments.</li>
<li><code>cpplib</code> logic and directive handling mostly unchanged.</li>
<li>After directive or file start call <code>enter_fragment</code> call-back.</li>
<li>Before next directive or file end call <code>exit_fragment</code> call-back.</li>
<li>
If first time seen, tells front-end to remember declarations.</li>
<li>
If fragment was previously-read, call-back checks dependencies for
validity. On success, remembered declarations are "pushed" into
top <code>binding_level</code>.</li>
<li>
If <code>enter_fragment</code> succeeds, <code>cpplib</code> skips
forward to end of fragment.</li>
</ul>
</slide>

<slide id="bindings">
<caption>Remembering and restoring declarations</caption>
<ul>
<li>
Each front-end is responsible for:
<ul>
<li>remembering top-level declarations;</li>
<li>remembering / checking dependencies;</li>
<li>restoring declarations if re-use is ok.</li>
</ul>
</li>
<li>Hooks/conventions to make this relatively easy.</li>
<li>Complication: Old tree nodes get modified with new
information.  E.g. C++ function overloading.</li>
<li>Likely solution: Remember modifications in undo buffers.
Before re-starting compilation, we must undo modifications to old trees.</li>
</ul>
</slide>

<slide id="non-nesting">
<caption>Fragment/nesting overlap</caption>
<ul>
<li>
Each declation needs to belong to a single fragment.</li>
<li>
Problems if a declaration (such as a <code>struct</code>) is spread
out over multiple fragments.</li>
<li>
If one of the fragments gets re-used, but for some reason another
fragment gets invalidated, then parser may get fed nonsense.</li>
</ul>
</slide>

<slide id="nonnesting-example">
<caption>Non-nesting example</caption>
<pre>
// <i> fragment 1</i>
extern void F1(T1);

struct St {
  int value;
#ifdef __cplusplus
// <i> fragment 2</i>
  int getValue() { return value; }
#endif
// <i> fragment 3</i>
};

extern void F2(T2);
</pre>
</slide>

<slide id="handling-non-nesting">
<caption>Handling non-nesting</caption>
<ul>
<li>Easy: a <code>nesting</code> counter that
is non-zero when inside a declaration.</li>
<li>
Pre-invalidate fragments if <code>nesting&gt;0</code>
on fragment enter or exit.</li>
<li>
Better: combine fragments.  If <code>nesting&gt;0</code>
on fragment boundary, combine neighboring fragments
into one.</li>
<li>
Difficulty: must test conditionals before first combined
fragment.</li>
<li>
Dis-allowed if conditional moved across <code>#define</code>,
<code>#undef</code> or <code>#include</code>.</li>
</ul>
</slide>

<slide id="other-complications">
<caption>Other complications</caption>
<ul>
<li>Dependencies are of the form:
Fragment F was compiled with identifier I bound to D.</li>
<li>
We also have negative dependencies:
Fragment F was compiled with I undefined.</li>
<li>Need an efficient way to record these <def>negative dependencies</def>.</li>
<li>See paper for discussion of this and other complications.</li>
</ul>
</slide>

<slide id="Result">
<caption>Preliminary results</caption>
<ol>
<li>
Multiple trivial identical C programs.  Each just includes <code>Carbon.h</code>,
which includes many of Apple's GUI headers.<br/>

After initial compile, subsequent files 3 times as fast
as without server.</li>
<li>
Compiling a mix of medium-sized <code>gcc</code> <code>,c</code>
is over 30% faster using client+server.</li>
<li>
Compiling 9 Tcl <code>.c</code> files yields similar speed-up.</li>
</ol>
<ul>
<li>
Overhead of using server seems in the noise.</li>
<li>
Actual numbers will depend on fraction of header-code re-use.</li>
<li>
Numbers may get worse if we add more complete dependency checking.</li>
</ul>
</slide>

<slide id="conclusion">
<caption>Conclusion and status</caption>
<ul>
<li>Looks promising, but not ready for real use.</li>
<li>Closest to useful for C, but some work done for C++.</li>
<li>Little checked in so far, but hopefully more soon.</li>
<li>Full patch (relative to 3.4 mainline) available on request.</li>
</ul>
</slide>

</slides>
