In this article, I give more technical information on how this is all done.
What/Why a code browser?
As a developer, I spend more time reading code than writing, and I found reading code online a rather poor experience. Hence, I made a way to publish code in a way that can be read just like in a good IDE, with links, tooltips, and semantic highlighting. Read more about it in the original blog post.
The idea is that there is a generator that generates all the .html pages ahead of time. This is a bit like a compilation. In the process we also fill up a database containing all the symbols and where they are defined or used, as well as other information such as the documentation.
For example, the file for
QQmlEngine::removeImageProvider , looks like this:
<dec f='qtdeclarative/src/qml/qml/qqmlengine.h' l='132' type='void QQmlEngine::removeImageProvider(const QString & id)'/> <def f='qtdeclarative/src/qml/qml/qqmlengine.cpp' l='1209' type='void QQmlEngine::removeImageProvider(const QString & providerId)'/> <doc f='qtdeclarative/src/qml/qml/qqmlengine.cpp' l='1204'>/*! Removes the image provider for \a providerId. \sa addImageProvider(), QQuickImageProvider */</doc> <use f='qtdeclarative/tests/auto/quick/qquickimageprovider/tst_qquickimageprovider.cpp' l='360' u='c' c='_ZN23tst_qquickimageprovider14removeProviderEv'/> <use f='qtdeclarative/tests/auto/quick/qquickimageprovider/tst_qquickimageprovider.cpp' l='389' u='c' c='_ZN23tst_qquickimageprovider15imageProviderIdEv'/>
You can see that there is one entry for the declaration, the definition, the documentation, and one for each usage. This is the information displayed in the tooltip.
We also store information about inherited classes or methods.
See the symbol page that shows information from the tooltip and more.
This way, the whole generated browsable code is just a set of files that can be served by any simple web server. The whole thing is maybe three times as big as the original source code. Which still amounts to several GB when we host so much source code. However, it is highly compressible. To save space and ease upload, we use squashfs images. That way we even have atomic updates ☺
Using Clang to parse C/C++
Here is the interesting part: how is the generator working?
Clang is more than just a compiler, it is really a library to parse C and C++.
Clang provides all the tools required, all I have to do is to create an
clang::ASTConsumer. Once the parser has finished its job, we can then visit the full AST of the translation unit with the
clang::RecursiveASTVisitor. As explained in this tutorial.
We then visit all the declarations and usage nodes. We know the source location of the node, so we know if we are in a file that we should generate. In particular, we do not generate header files twice, so if the header file has already been parsed, we ignore that node. Knowing the location of the node, we can register a HTML tag for it. We give a
We also show the expansion of a macro in the macro tooltip. Macros does not appear as node in the AST because they are expanded before, in the pre-processing phase. We use
clang::PPCallback to be notified each time a macro is expended. Unfortunately, getting the actual expansion is far from easy since the expansion never appears as such in memory. What happens is that the pre-processor just provides tokens to the parser. We have to pre-process again the macro and write the token strings in to tooltip.
Comments are ignored by the parser so they are not part of the AST. We do another pass in which, for each file, in which we find comments and keywords for the basic syntax coloration of things that are not in the AST, and color these element appropriately. We will try to associate the comments with the closest declaration or definition, so it can go in the database (in the tooltip). We also recognize some doxygen commands such as
\fn which associate the comment with a different declaration.
Qt SIGNAL and SLOT
We detect a few Qt extensions. We recognize calls to
QMetaObject::activate and similar call like
QTimer::singleShot (See the full list of recognized functions). Since
SLOT are macros that transform their argument to string, we can easily extract the string literal and parse that in order to find to what method it is. We know in which class to look because know the type of the QObject sub class of the receiver parameter.
When looking at the AST, we see how the variables are used. We can classify if the variable was simply read, or modified. We add a little letter in the uses in the tooltip. If you click on the little 🔗 icon in the top-right of the tooltip you can filter by type of usage.
Similarly, we see when arguments are passed by references, and we annotate the source code with little
&. We also annotate the code with little
⎀ when there is an implicit call to a constructor or conversion operator.
The clang tooling needs to interface with the build system to know the list of compilation commands. (The flags passed to the compiler such as the include paths, the defines, or other languages options).
All the commands are in a compilation database hold in a
compile_commands.json file. If one use cmake as a build system, it is trivial to generate this list of command by passing
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON. Ninja also can export the compile_commands. But with others build system it is a bit more complicated. With some build system such as qmake, we parse the output of
make -n. When that is not supported, there is a script used as a proxy compiler that records the compilation commands.
Find out more about this in the README.
Search for a file or function
In order to find a file, the browser will first download, via AJAX, a file containing the list of all the files for this project. For the functions, a file containing all the functions would be too big. We therefore split the functions in several files starting with the first two letters of the function. So when someone starts typing "
Parse", the file
pa being the first two letters) gets downloaded to find matches in there.