A Review of Clang/LLVM
Conor Twomey | R00121583
Louise Walsh | R00128425
This project will give a comprehensive review of the Clang
and LLVM compiler technologies, with particular emphasis being given to the
Abstract Syntax Tree, in terms of definition and use, as well as its different
variations. It will also discuss the advantages of Clang over GCC. Clang and
LLVM are compiler technologies designed mainly for the C family of programming
languages. As part of this introduction, firstly, a definition of each term
will be given.
Clang is a C language
family frontend, which uses LLVM as a backend. Clang is built over LLVM and was
built to replace GNU Compiler Collection (GCC). Clang was originally developed
by Apple, as GCC didn’t offer sufficient support for Objective-C. Its
contributors include Apple, Microsoft and Google, as well as more, with Apple making
extensive use of LLVM in many of its systems, including the iPhone’s SDK, and
IDE. It is an open source software.
The LLVM Project is a collection of modular and reusable
compiler and toolchain technologies. LLVM is a library. It is used to
construct, and produce optimized intermediate/binary machine code. There are
several sub-projects of LLVM, with Clang being one of them.
Clang and LLVM are both written in C++, and they produce an
Abstract Syntax Tree (AST), which will also be a focus of this review. An
Abstract Syntax Tree, or just syntax
tree, is a representation, using a tree-like structure, of the abstract
syntactic structure of code of a given language. A tree-structure has many
nodes, with each node denoting a construct occurring the given source code.
What makes the tree ‘abstract’, is that not every detail that occurs within the
given source code is represented. Abstract trees are used to aid in the
analysis of programs, and, in program transformation systems.
This review will begin with a comprehensive overview of
Clang. In this overview, the origins of Clang will be discussed, including the
reason behind its development. The advantages and disadvantages of Clang over
GCC will also be reviewed and discussed.
We will now discuss Clang, in terms of origin and
Clang is a front end for LLVM, it
is a compiler for languages such as C, C++, Objective-C, OpenMP and CUDA. Clang was originally released as open source by Apple in
July 2007. It was mainly developed as a replacement for GCC. The developers at
Apple had originally tried to use GCC’s frontend, but discovered that the GCC
source code was large and cumbersome to work with, as Apple worked extensively
This led to the development of Clang, a new LLVM frontend,
that supports more C-based languages. It featured a quick development time, and
was able to compile a Linux kernel within 3 years of being open sourced. The
combination of LLVM and Clang led to a comprehensive toolchain that could
replace the full GCC stack. However, this means that the Clang front end is
still relatively new.
With regards to advantages of using Clang, they will be
discussed in terms of End-User features, Utilities and applications, and
internal design and implementation. The Clang frontend compiles quickly, with
very low memory usage, under a series of different tests. As competition
between Clang and GCC became more heated, these compilation times became more
Clang has a modular library based architecture, which is
extremely flexible and easy to extend. A modular based architecture is more
intuitively flexible for developers to use. Also, as Clang was developed to
work better with the Apple IDEs, it allows for tighter integrations with IDEs.
In terms of internal design and implementations, Clang has a
single unified parser for C, C++ and objective C, with conformances for
variations of C. Clang is also straightforward to use, with an easy code base.
This makes clang more intuitive to use than GCC. Clang also supports GCC. Next,
this review will discuss the differences between GCC and Clang.
GCC Vs CLANG
In this section we will discuss the differences of Clang, and
GCC and how differences in goals can lead to strengths and weaknesses in
different front-end compilers.
In terms of language support, GCC does support more languages
than Clang, including languages such as Java, Fortran etc. GCC also supports
many more language extensions than Clang. However, Clangs support of C++ is
more pliant than GCCs.
One of the more practical improvements with Clang is that the
error messages and design are more understandable for any developers with a basic
understanding of the languages being used, and with a basic understanding of
compilers. Alternatively, the GCC codebase is very old, which can prove to be a
steep learning curve for any new developers hoping to make use of it.
Conceptually, and from its inception, Clang has been designed
as an API, which means any source analysis tools can utilize it easily, as well
as the likes of IDEs, and as well for code generation. GCC in comparison, is
static, and is extremely difficult to use as an API, which means the
integration with other tools can be difficult. This also makes it difficult to
decouple the front-end from the rest of the compiler.
Due to the modular design and architecture of Clang, it is
easy to reuse. GCC, however, due to its basic design, is very difficult to
reuse, and very difficult to modify. It also uses a custom garbage collector
and uses global variables extensively, and is also not multi-threadable. This
leads to further issues, including memory issues, that Clang doesn’t
Clang, in the pursuit of creating a more developer friendly
system, includes much more clear and concise error messages and diagnostics,
with more support for these diagnostics. Some newer versions of GCC have
incorporated some of these Clang features to try to become more useable, but
GCC still has progress to make in this area.
Clang was also much faster at compilation briefly, before GCC
attempted to decrease its compilation times and started a healthy competition.
Both compilers now have a much faster compilation time from when Clang was started,
and both support a wide range of languages. GCC still compiles many more
languages and is still considered more of a standard, but Clang rivals very
closely for the C family of languages.
Clang does feature faster compile times and a lower memory footprint;
however, it is not consistently the leader. GCC has become a closer rival over
time, and it depends entirely on the program that is being compiled which will
decide the winner for compile times and memory footprints.
To summarise this section, it is clear to see that when it
comes to determining which to use, GCC or Clang, the developer must first look
at their own competencies, the language and requirements of the system.
Abstract Syntax Tree (AST)
“An abstract syntax tree is a tree representation of the
abstract syntactic structure of source code written in a program language.”
The front end, i.e. Clang, is responsible for parsing the
source code. It then checks for errors and turns the input code into an AST. An
abstract syntax tree is used to aid the comprehension of some programs, as what
happens in each line of code may not be exactly what is expected, where the AST
will show exactly what happens behind the scenes. This helps the traversal of
codes during reviews and transformations. To use this tree structure, the tree
needs to be traversed efficiently and effectively. The tree is inherently more
convenient to analyse and modify than any text based analysis.
An AST is a tree with a structure of source code that is
abstracted from the syntax of its original programming language, allowing an
easy conversion from one language to another, by taking the AST developed by a
program in one language, and reading it back into source code of another
language. The development of an AST is through a method of parsing the given
The AST also shows more details about the fully compiled
code, with a closer representation to the actual program, whilst abstracting
the minute details. It goes into further detail about the minutiae behind a
loop for example, and will explain verbosely what happens at each stage of the
An example of this would apply to dynamically typed languages
changes type of value in these languages (e.g. change from int to string
values) which would not be as obvious in the source code (e.g. initial_value =
Clang Abstract Syntax Tree (CAST)
The Clang Abstract Syntax Tree is the specific version of the
Abstract Syntax Tree, used by Clang, which supports the C family of languages.
The CAST is different from traditional ASTs produced by other compilers, as it
closely resembles the written C++ code and the C++ standard. The nodes in a
CAST can closely resemble class hierarchy’s.
The CAST uses three
core groups of classes, statements, declarations and types. These three
classes form the base of a range of specializations. Each of these core groups
do not inherit from one single base class so each node type requires a
different interface to visit. Therefore, each of these nodes have a dedicated traversal
method, to navigate the tree.
CAST can be used with command line arguments that will
effectively reproduce the same source code that it was given, however, it will
print a more explicit version. This will include changes such as prepending
“this->” to references to the class variables, which may be implicit in the
original source code. This makes comprehension of the code easier, as it is
immediately understood that it is a reference to a class variable, but is
usually an unnecessary addition for most programmers.