Qt 6.5.3 Released

We have released Qt 6.5.3 today. As a patch release, Qt 6.5.3 does not introduce any new features but contains more than 500 bug fixes, security updates, and other improvements to the top of the Qt 6.5.2 release. See more information about the most important changes and bug fixes from Qt 6.5.3 release note.

Pitfalls of lambda capture initialization

Recently, I’ve stumbled across some behavior of C++ lambda captures that has, initially, made absolutely no sense to me. Apparently, I wasn’t alone with this, because it has resulted in a memory leak in QtFuture::whenAll() and QtFuture::whenAny() (now fixed; more on that further down).

I find the corner cases of C++ quite interesting, so I wanted to share this. Luckily, we can discuss this without getting knee-deep into the internals of QtFuture. So, without further ado:

Time for an example

Consider this (godbolt):

#include <iostream>
#include <functional>
#include <memory>
#include <cassert>
#include <vector>

struct Job
{
    template<class T>
    Job(T &&func) : func(std::forward<T>(func)) {}

    void run() { func(); hasRun = true; }

    std::function<void()> func;
    bool hasRun = false;
};

std::vector<Job> jobs;

template<class T>
void enqueueJob(T &&func)
{
    jobs.emplace_back([func=std::forward<T>(func)]() mutable {
        std::cout << "Starting job..." << std::endl;
        // Move func to ensure that it is destroyed after running
        auto fn = std::move(func);
        fn();
        std::cout << "Job finished." << std::endl;
    });
}

int main()
{
    struct Data {};
    std::weak_ptr<Data> observer;
    {
        auto context = std::make_shared<Data>();
        observer = context;
        enqueueJob([context] {
            std::cout << "Running..." << std::endl;
        });
    }
    for (auto &job : jobs) {
        job.run();
    }
    assert((observer.use_count() == 0) 
                && "There's still shared data left!");
}

Output:

Starting job...
Running...
Job finished.

The code is fairly straight forward. There’s a list of jobs to which we can be append with enqueueJob(). enqueueJob() wraps the passed callable with some debug output and ensures that it is destroyed after calling it. The Job objects themselves are kept around a little longer; we can imagine doing something with them, even though the jobs have already been run.
In main(), we enqueue a job that captures some shared state Data, run all jobs, and finally assert that the shared Data has been destroyed. So far, so good.

Now you might have some issues with the code. Apart from the structure, which, arguably, is a little forced, you might think “context is never modified, so it should be const!”. And you’re right, that would be better. So let’s change it (godbolt):

--- old
+++ new
@@ -34,7 +34,7 @@
     struct Data {};
     std::weak_ptr<Data> observer;
     {
-        auto context = std::make_shared<Data>();
+        const auto context = std::make_shared<Data>();
         observer = context;
         enqueueJob([context] {
             std::cout << "Running..." << std::endl;

Looks like a trivial change, right? But when we run it, the assertion fails now!

int main(): Assertion `(observer.use_count() == 0) && "There's still shared data left!"' failed.

How can this be? We’ve just declared a variable const that isn’t even used once! This does not seem to make any sense.
But it gets better: we can fix this by adding what looks like a no-op (godbolt):

--- old
+++ new
@@ -34,9 +34,9 @@
     struct Data {};
     std::weak_ptr<Data> observer;
     {
-        auto context = std::make_shared<Data>();
+        const auto context = std::make_shared<Data>();
         observer = context;
-        enqueueJob([context] {
+        enqueueJob([context=context] {
             std::cout << "Running..." << std::endl;
         });
     }

Wait, what? We just have to tell the compiler that we really want to capture context by the name context – and then it will correctly destroy the shared data? Would this be an application for the really keyword? Whatever it is, it works; you can check it on godbolt yourself.

When I first stumbled across this behavior, I just couldn’t wrap my head around it. I was about to think “compiler bug”, as unlikely as that may be. But GCC and Clang both behave like this, so it’s pretty much guaranteed not to be a compiler bug.

So, after combing through the interwebs, I’ve found this StackOverflow answer that gives the right hint: [context] is not the same as [context=context]! The latter drops cv qualifiers while the former does not! Quoting cppreference.com:

Those data members that correspond to captures without initializers are direct-initialized when the lambda-expression is evaluated. Those that correspond to captures with initializers are initialized as the initializer requires (could be copy- or direct-initialization). If an array is captured, array elements are direct-initialized in increasing index order. The order in which the data members are initialized is the order in which they are declared (which is unspecified).

https://en.cppreference.com/w/cpp/language/lambda

So [context] will direct-initialize the corresponding data member, whereas [context=context] (in this case) does copy-initialization! In terms of code this means:

  • [context] is equivalent to decltype(context) captured_context{context};, i.e. const std::shared_ptr<Data> captured_context{context};
  • [context=context] is equivalent to auto capture_context = context;, i.e. std::shared_ptr<Data> captured_context = context;

Good, so writing [context=context] actually drops the const qualifier on the captured variable! Thus, for the lambda, it is equivalent to not having written it in the first place and using direct-initialization.

But why does this even matter? Why do we leak references to the shared_ptr<Data> if the captured variable is const? We only ever std::move() or std::forward() the lambda, right up to the place where we invoke it. After that, it goes out of scope, and all captures should be destroyed as well. Right?

Nearly. Let’s think about the compiler generates for us when we write a lambda. For the direct-initialization capture (i.e. [context]() {}), the compiler roughly generates something like this:

struct lambda
{
    const std::shared_ptr<Data> context;
    // ...
};

This is what we want to to std::move() around. But it contains a const data member, and that cannot be moved from (it’s const after all)! So even with std::move(), there’s still a part of the lambda that lingers, keeping a reference to context. In the example above, the lingering part is in func, the capture of the wrapper lambda created in enqueueJob(). We move from func to ensure that all captures are destroyed when the it goes out of scope. But for the const std::shared_ptr<Data> context, which is hidden inside func, this does not work. It keeps holding the reference. The wrapper lambda itself would have to be destroyed for the reference count to drop to zero.
However, we keep the already-finished jobs around, so this never happens. The assertion fails.

How does this matter for Qt?

QtFuture::whenAll() and whenAny() create a shared_ptr to a Context struct and capture that in two lambdas used as continuations on a QFuture. Upon completion, the Context stores a reference to the QFuture. Similar to what we have seen above, continuations attached to QFuture are also wrapped by another lambda before being stored. When invoked, the “inner” lambda is supposed to be destroyed, while the outer (wrapper) one is kept alive.

In contrast to our example, the QFuture situation had created an actual memory leak, though (QTBUG-116731): The “inner” continuation references the Context, which references the QFuture, which again references the continuation lambda, referencing the Context. The “inner” continuation could not be std::move()d and destroyed after invocation, because the std::shared_ptr data member was const. This had created a reference cycle, leaking memory. I’ve also cooked this more complex case down to a small example (godbolt).

The patch for all of this is very small. As in the example, it simply consists of making the capture [context=context]. It’s included in the upcoming Qt 6.6.0.

Bottom line

I seriously didn’t expect there to be these differences in initialization of by-value lambda captures. Why doesn’t [context] alone also do direct- or copy-initialization, i.e. be exactly the same as [context=context]? That would be the sane thing to do, I think. I guess there is some reasoning for this; but I couldn’t find it (yet). It probably also doesn’t make a difference in the vast majority of cases.

In any case, I liked hunting this one down and getting to know another one of those dark corners of the C++ spec. So it’s not all bad 😉.

Unlock the Power of Qt with Felgo's Qt Explained Series on YouTube

With powerful SDK components and unique tools tailored to the daily needs of developers, Felgo’s mission is to enable developers to work efficiently and create better Qt applications. As a Qt Technology and Service Partner, Felgo supports developers and businesses in bringing ideas to life and reaching their goals.  

To make Qt development more accessible and strengthen the community, we are now thrilled to announce a brand-new video series: Qt Explained. It is designed to empower developers of all levels with the magic of Qt:

Qt Journey– Revolutionizing networking security

by Emilia Valkonen-Damjanovic (Qt Blog)

In this series, we'll bring you various career stories from people working with Qt. Today, I'm interviewing Jeremy Tubongbanua, a software engineering student and software engineer at Atsign, based in Ontario, Canada.

See you in Berlin in November 2023?

In just a couple of months, there’s going to be not one,

not two,

not three,

but four fantastic developer events in Berlin! We are not going to miss any of them, and so shouldn’t you.

November 12-14: Meeting C++

If you are a C++ developer, do not miss the 2023 edition of Meeting C++. This year the conference will be again in a hybrid format, partially online and partially live in Berlin. The list of talks is, as usual, of the utmost quality.

The closing keynote will be held by Dr. Ivan Čukić, Senior Software Engineer here at KDAB.

You can find more information, including how to register, here.

November 27: KDAB Training Day

KDAB organizes a Training Day in Berlin! You can choose from five different 1-day courses, featuring content which is part of our regular 3 to 4-days courses:

  • What’s new in C++23 with Giuseppe D’Angelo
  • QML Application Architecture with André Somers
  • Profiling on Linux with Milian Wolff
  • Porting to Qt 6 with Nicolas Fella
  • A taste of Rust (with a drop of Qt) with Florian Gilcher

You can get your tickets from here!

November 28-29: Qt World Summit 2023

The Qt World Summit is back in person! Once again, it’s going to be hosted at the fantastic Berlin Congress Center in Alexanderplatz. This year’s program features several keynotes and over 25 talks in 4 tracks. KDAB will also be present:

  • Till Adam, Chief Commercial Officer of the KDAB Group, will talk about The Future of Interventional Cardiac Imaging;
  • Nicolas Arnaud-Cormos, senior software engineer and trainer at KDAB, will talk about script automation in Qt applications.

Tickets are available here.

November 30-December 1: Qt Contributors’ Summit 2023

Last but not the least: join us into shaping the future of Qt!

Did you know that Qt is a free software project, with an open governance model? At the Qt Contributors’ Summit the developers of Qt will meet for two days and will discuss Qt’s roadmap: what features are missing? What is being worked upon? In which direction should Qt evolve?

You can see the sessions that have been scheduled so far on Qt Project’s wiki. Anyone who’s interested contributing something to Qt (not just development! Community work, documentation, etc.) is very welcome. You can find all the relevant practical information here.

See you in Berlin!

The post See you in Berlin in November 2023? appeared first on KDAB.

Generic Struct Handling is Coming to Qt OPC UA

Generic Struct Handling is Coming to Qt OPC UA

OPC UA servers often use structured data types, for example when they are implementing a companion specification or exposing custom structured data types from a PLC program. Up to now, Qt OPC UA was just returning a binary blob when reading such a value and the decoding was left entirely to the user. Since OPC UA 1.04, there is a standardized way for a server to expose the data type description for custom data types. We have extended Qt OPC UA to use this information to make it much easier to encode and decode custom data types. The following article introduces the new API.

Continue reading Generic Struct Handling is Coming to Qt OPC UA at basysKom.

Introducing the RiveQtQuickPlugin – Powerful Animations For Your QtQuick Applications

Rive is a popular tool for vector animations. While the editor itself is a closed source commercial product, there are FOSS implementations for the player runtime. basysKom has developed a QtQuick integration based on the rive-cpp library. This article introduces the project and its current state.

Continue reading Introducing the RiveQtQuickPlugin – Powerful Animations For Your QtQuick Applications at basysKom.

How Trademarks Affect Open Source Software — How do trademarks relate to copyrights and what are the implications for open source software

As previously discussed, a central concept of open source licenses is that the author of a given program grants anyone permission to use, modify, or redistribute their code without seeking specific permission first. But these licenses generally cover the copyrights to the underlying code and not any trademarks that may be used by the original author in connection with their work.

A trademark is basically a word, phrase, symbol, or design--or any combination of those elements--that identifies a unique product or service in commerce. For example, "Python" is a registered trademark of the Python Software Foundation (PSF). The famous blue-and-yellow "intertwined snakes" Python logo is also a PSF trademark.

So how exactly do trademarks work? And as a developer, what can you do--and not do--with someone else's trademarks in your own work?

Trademarks Are Not Copyrights

To the layperson, it might seem like trademarks are just a method by which someone can copyright a particular word or phrase. In fact, copyright and trademark are distinct forms of intellectual property. A copyright applies to a created work, such as a book or a software program, that is original and exists in some fixed form. The purpose of copyright to is to enable the author to exercise exclusive control over the right to reproduce or distribute their work for a specified period of time.

A trademark, in contrast, is primarily about protecting consumers from harm. A trademark identifies the source of a good or service. This helps guard against potential counterfeiting or fraud while affording the trademark holder legal protection for their brand. But the trademark holder does not "own" the trademarked word or phrase. Rather, they exercise a limited right to use the trademark in connection with the sale of their product or service in a given area.

Trademarks can be used in connection with a variety of products within the same brand. The PSF uses its Python mark, for instance, not just to refer to the programming language itself but also merchandise such as T-shirts and hats. But the PSF could not enforce its Python mark in an area of commerce where it does not actually produce any products.

Like copyright, a trademark holder can register their mark with the government, although it is not required. In the United States, the U.S. Patent and Trademark Office (USPTO) manages trademark registrations at the federal level. A registered trademark is indicated by an "R" with a circle around it (®), while an unregistered trademark is indicated with the letters "TM" (™). As with copyright, registering a trademark provides certain legal benefits, such as a presumption that the mark is valid and belongs to the registrant.

When Do You Need a Software Trademark Holder's Approval to Use Their Name?

Another key difference between copyright and trademark is enforcement. With copyright, an author can be fairly lax in their assertion of rights without losing them. Indeed, open source licensing is essentially built around the notion that most developers do not want to spend their time policing how other people use their code. Yet the license itself still retains copyright in the event the author feels compelled to take legal action against infringement.

With a trademark, however, the holder needs to be more proactive in asserting their rights. The reason for this is that unlike copyright, a trademark can last indefinitely, provided it continues to be actively used and defended by the holder. The more passive the holder is in defending their rights, the more likely that the USPTO or a judge may find the trademark is no longer in active use.

For this reason, many organizations that rely on trademarks will publish a written policy governing their acceptable use. The PSF has a very detailed trademark usage policy, which provides a good model for how enforcement works in practice. Fortunately for developers, the PSF policy makes it clear that the organization wants the "Python" mark and logo "to be used with minimal restriction to refer to the Python programming language."

To be clear, even without this policy, there are many uses of the "Python" trademark allowed by U.S. law (and the laws of other nations) with respect to "fair use." There are basically two kinds of fair use: nominative and descriptive. Nominative fair use simply means you are referring to the trademarked good or service. So if you write a program in Python, you are allowed to refer to the "Python" trademark without asking the PSF's permission first.

Descriptive fair use means you are using the trademark to describe some other product or service. This tends to come up when you are comparing one product with another. For instance, if you write an article comparing the benefits and drawbacks of Python with Rust, that would involve descriptive fair use. Again, you don't need permission to do this.

So when do you need a trademark holder's approval? The PSF's own policy states approval is necessary for any "commercial use" of the "Python" trademark to describe another product or company. In other words, you don't need the PSF's permission to describe your own program as being written in Python, or even to describe yourself as a Python developer. But you would need the PSF's permission to name your company something like "Python Software, Inc."

But what if you wanted to make or sell merchandise with the Python logo? Again, the PSF's policy permits anyone to use the logo without permission for non-commercial purposes. It's okay to make a Python-logo shirt for yourself and wear it at PyCon. But if you wanted to sell those shirts as PyCon, you would need a license from the PSF and potentially pay royalties on any sales.

Four Things to Know About Software and Trademarks

If you are developing software, you need to be aware of potential trademark issues. A qualified intellectual property attorney can answer specific questions for your situation. But speaking in general terms, here are four things to keep in mind:

  1. If you plan to use the name of any existing company or project in describing your own software, make sure to check and see if the trademark holders have a published policy regarding usage like the PSF.
  2. When selecting a name for your own project, you can check the U.S. Patent and Trademark Office's trademark database to see if the same or similar name is already in use with respect to software.
  3. Even if a trademark is not formally registered with the USPTO, that does not mean the trademark is not still legally protected.
  4. An open source license like the GPL, MIT, or BSD licenses do not convey any rights with respect to trademarks. Never assume that you can use the name of another software project in your own commercial application just because the underlying code is freely licensed.

Customizing Your Tkinter App's Windows — Make Your Tkinter App's Windows Have Different Looks

Different desktop applications have different design requirements. If you look at the applications on your computer, you will see that most of them have different window designs. Some applications, like games, run in full-screen mode. Utility applications, like calculators, run in fixed-size mode with the maximize or minimize button disabled.

Forms or windows have different appearances based on their app's requirements. As you create your own Tkinter applications, you might also want to have windows without a title bar, windows that can't be resized, windows that are zoomed, and even windows that show some level of transparency.

In this tutorial, you will learn some Tkinter tricks and techniques that you can use to customize your applications' windows.

Getting to Know Window Configurations

In Tkinter, a window configuration is either a setting or an attribute that you can use to specify a property of that window. These properties may include the window's width, height, position, transparency, title bar, background color, and more.

These configurations allow you to tweak and customize the look and feel of your application's windows and forms so that they look modern and nice in the eyes of your app's users.

For example, let's say you want to create a game, and you need to remove the main window's title bar. Keep reading to learn how to do this in Tkinter.

Creating a Simple Window in Tkinter

To kick things off, let's create a minimal Tkinter app that will work as our starting point for learning how to remove a window's title bar. Here's the required code:

python
from tkinter import Tk

# Create the app's main window
root = Tk()
root.title("Window With a Title Bar")
root.geometry("400x300+300+120")

# Run the app's main loop
root.mainloop()

Here, we import Tk from tkinter. Then we create the app's main window, root, by instantiating Tk. Next, we give our window a title and geometry using the title() and geometry() methods, respectively.

Go ahead and save this code to a file called app.py. Then run the file from your command line. The output will look something like this:

A Tkinter app showing a window with the default title bar A Tkinter app showing a window with the default title bar

On your screen, you'll get a regular Tkinter window with the title bar and the decoration provided by your current operating system.

Removing the Window's Title Bar in Tkinter

Tkinter makes it possible for you to remove the system-provided title bar of your app's main window. This tweak is handy when building a custom GUI that doesn't use the default window decorations.

The default title bar highlighted on our example window The default title bar highlighted on our example window

In the image above, the red border highlights the window's title bar. That's what we want to remove. To do that, we can use a method called overrideredirect(). If we pass True as an argument to this method, then we'll get a frameless window.

Go ahead and update your code. Make it look something like this:

python
from tkinter import Tk

# Create the app's main window
root = Tk()
root.geometry("400x300+300+120")

# Removes the window's title bar
root.overrideredirect(True)

# Run the app's main loop
root.mainloop()

By calling root.overrideredirect(True), we tell the window manager (which manages windows on your desktop) not to wrap the window in the usual window decorations. If you run the app again, then you will get the following output:

A Tkinter app showing a window without title bar A Tkinter app showing a window without title bar

You have successfully created a Tkinter app with a window that doesn't have the standard window decorations from your desktop window manager.

Because the app's window has no close button, you must press Alt+F4 to close the window and terminate the app.

Disabling the Window's Maximize/Minimize Button

There are some situations where we would want to have a window with a title bar but with a fixed size. That would be the case with a calculator application, for example. To do that, we can use the resizable() method, which takes two boolean arguments:

  1. width specifies whether the window can be horizontally resized.
  2. height specifies whether the window can be vertically resized.

If you pass False for both arguments, you will disable resizing in both directions. Below we've modified the code for our simple Tkinter app, preventing users from resizing the main window:

python
from tkinter import Tk

# Create the app's main window
root = Tk()
root.title("Fixed Size Window")
root.geometry("400x300+300+120")

# Disable the window's resizing capability
root.resizable(False, False)

# Run the app's main loop
root.mainloop()

In this example, the code calls resizable() with its width and height argument set to False. This call makes the window unresizable. If you run this app, then you'll get the output shown below:

A Tkinter app showing a fixed size window A Tkinter app showing a fixed size window

Try to resize this window by dragging any of its borders, and you'll find that you can't resize it in either direction. You will also discover that the maximize/minimize buttons are now also disabled, preventing you from resizing the window in this way.

Displaying the App's Window in Zoomed Mode

Tkinter also allows you to display an app's window in zoomed mode. In zoomed mode, your application's window will display in fullscreen. A common scenario where this mode comes in handy is when you want to provide an immersive experience to your users.

On Windows and macOS, the method for displaying the app's window in zoomed mode is state(). You can pass the "zoomed" string as an argument to this method to get the desired result. The code for that will look like below:

python
from tkinter import Tk

# Create the app's main window
root = Tk()
root.title("Zoomed Window")
root.geometry("400x300+300+120")

# Set the window to a zoomed mode
root.state("zoomed")

# Run the app's main loop
root.mainloop()

The line root.state("zoomed") makes the window display already zoomed on both Windows and macOS. If you are on Linux, then use root.attributes("-zoomed", True) instead. The app's window looks something like this:

A Tkinter app showing a zoomed window A Tkinter app showing a zoomed window

In this screenshot, you can see that the application's main window occupies the entire screen, which gives you a larger working area.

Changing the Window's Transparency Level

What if you wanted to change the transparency of your app's main window? You can do this using the attributes() method. To set the transparency, you provide two arguments: first the string "-alpha", then a floating-point number that ranges from 0.0 to 1.0. A value of 0.0 represents the highest transparency level (full transparency, your window will become invisible), while a value of 1.0 value represents the lowest level (no transparency).

Let's create a window with a 0.6 transparency level:

python
from tkinter import Tk

# Create the app's main window
root = Tk()
root.title("0.6 Transparency Window")
root.geometry("400x300+300+120")

# Set the -alpha value to 0.6
root.attributes("-alpha", 0.6)

# Run the app's main loop
root.mainloop()

In this example, we set the "-alpha" attribute to 0.6. This tweak generates a window that looks something like this:

A Tkinter app showing a transparent window A Tkinter app showing a transparent window

Your app's main window is now 60% transparent. Isn't that cool? Do you have any creative ideas for your next application?

Conclusion

In this tutorial, you've gone through the process of customizing the root window of a Tkinter application using several different methods, attributes, and properties. You've learned how to remove the title bar of a window, make a window have a fixed size, display a window in zoomed mode, and more.

Reverse Engineering Android Apps

Reverse engineering in general is a tricky business and sometimes not very orthodox. So, why bother to write this article?

Well, sometimes reverse engineering is also for something good. It started when my wife dusted off her watch. We had a huge unpleasant surprise when we found that the companion app is not available anymore on Google Play! The watch is completely useless without the companion app, as you can’t even set the time on it… Because I hate to throw away a perfectly working watch I decided to create an app for it myself.

My first instinct was to find an older phone with the app still alive and to use a BLE sniffer to reverse engineer the BLE protocol. But I didn’t find the application installed on any old phones. I found the application online but the application cannot be used anymore as it was using some online services which are offline now…

Next obvious step was to decompile the application to get the communication protocol and also the algorithms behind the sleep & activities. This is how our story begins ;-).

Long story short

Decompiling Android apps is not that complicated. It takes time (a LOT of time), but it’s fun and rewarding.

Just don’t believe all those movies where someone decompiles an app and understands all the logic behind it in seconds :).

Tools I used to decompile:

  • tintinweb.vscode-decompiler, a VSCode extension (https://marketplace.visualstudio.com/items?itemName=tintinweb.vscode-decompiler). Using this extension is quite easy to decompile an apk, just right click on it and the magic will happen in a few seconds. The only problem I found is that it doesn’t do any de-obfuscation (or at least I didn’t setup it correctly).
  • Dex to Java decompiler (https://github.com/skylot/jadx). I found it better than vscode-decompiler as it has semi de-obfuscation. You’ll never get the original namings, but you get unique names instead of a, b, c, etc.
  • NSA’s ghidra (https://github.com/NationalSecurityAgency/ghidra). Apart from a lot of java code, this application has all the logic into native C++. I used ghidra for decompiling the native (C++) stuff. It has java decompiler as well, but is not as good as jadx.

Short story long

I chose an older version as the last one had support for too many watches which I didn’t care about (at least not now). Android APKs supports (partial) obfuscation, which makes the decompilation not that straight forward, in some cases it’s actually pretty complicated. What does obfuscation do? It renames all packages to: a, b, c, etc. then all classes from each packages to: a, b, c, etc., then all members of each class to: a, b, c, etc., then all fields to (yes, you guessed right): a, b, c, etc. This means you’ll end up with loads of a classes, member functions and fields.

Some times it is easy to guess what the fields are, e.g.:

jSONObject.put("result", (int) this.c.a);
jSONObject.put("fileHandle", this.c.c);
jSONObject.put("newSizeWritten", this.c.d);

But in some cases you need to do lot of detective work.

As I mentioned in the beginning, I used 3 tools: vscode-decompiler extenstion, jadx and ghidra:

I started with vscode-decompiler, hoping that githubs copilot will help me in the process. It turned out to be completely useless for such tasks. When I imported the decompiled stuff into AndriodStudio, due to obfuscation, 90% of the classes had problems. Because there are dozens of classes with the same name (i.e. “a“, “b“), imagine how many conflicts you get.

Next was to use jadx to decompile the application, which supported semi de-obfuscation. I could import the project into AndroidStudio. Now, all the obfuscated classes have unique names (e.g. C1189f), which makes the AndroidStudio happier.

Just to be crystal clear, you cannot recompile the application and run it, unless the application is simple enough! After a few hours of guessing the name of the classes and their fields, I finally found what I was looking for: the BLE protocol! To my surprise, it has so many commands. I quickly cleaned out a few BLE commands that I was interested in:

  • start/stop the sync animation on the watch
  • set/get the time

I used bluetoothctl to quickly try the start/stopAnimation BLE commands, it worked perfectly.

The application has all the sleep & activities logic written in C++, so I had also to decompile the native part as well. For this job, I used ghidra with https://github.com/extremecoders-re/ghidra-jni extension. Ghidra is a fantastic tool. I tried a few more tools: radare2/rizin, binary ninja (the free online version), but, personally, I found ghidra the one most rich in features. The C++ compiled code is obfuscated “by design” due to various optimizations done by C/C++ compilers and it’s far, FAR harder to decompile than java. A long time ago I did a lot of binary decompilation and most of the time when I was trying to generate any C/C++ code from a binary, it resulted in pure garbage. I was amazed at how good ghidra’s C/C++ decompilation is.

Just to be clear, it requires a *LOT* of time to clean the code, to define all the structures, to do all the connections between them and to un-flatten all the STL stuff (here, some STL internals knowledge is needed), but the experience was better than I ever dreamt. Even if we can guess what it does from the function name, let’s take a very simple example to see what the C++ decompilation looks like and how verbose STL can be:

undefined Java_package_class_name_ActivityVect_1add
                    (JNIEnv *env,jclass thiz,jlong a0,jobject a1,jlong a2,jobject a3)
{
  undefined uVar1;
  undefined8 *puVar2;
  undefined8 uVar3;
  undefined8 *puVar4;

  if (a2 == 0) {
    uVar1 = FUN_0015d3ac((long *)env,7,
                         "std::vector< Activity >::value_type const & reference is null");
    return uVar1;
  }
  puVar4 = *(undefined8 **)(a0 + 8);
  puVar2 = *(undefined8 **)(a0 + 0x10);
  if (puVar4 != puVar2) {
    if (puVar4 != (undefined8 *)0x0) {
      uVar3 = *(undefined8 *)(a2 + 8);
      *puVar4 = *(undefined8 *)a2;
      puVar4[1] = uVar3;
      uVar3 = *(undefined8 *)(a2 + 0x18);
      puVar4[2] = *(undefined8 *)(a2 + 0x10);
      puVar4[3] = uVar3;
      uVar3 = *(undefined8 *)(a2 + 0x28);
      puVar4[4] = *(undefined8 *)(a2 + 0x20);
      puVar4[5] = uVar3;
      uVar3 = *(undefined8 *)(a2 + 0x38);
      puVar4[6] = *(undefined8 *)(a2 + 0x30);
      puVar4[7] = uVar3;
      puVar2 = *(undefined8 **)(a2 + 0x40);
      puVar4[8] = puVar2;
    }
    *(undefined8 **)(a0 + 8) = puVar4 + 9;
    return (char)puVar2;
  }
  std::vector<>::_M_emplace_back_aux<>((vector<> *)a0,(Activity *)a2);
  return (char)a0;
}

All right, the decompiled code is pretty cryptic and it doesn’t tell us too much. Now, let’s see if we can make it better:

  • first we need to define the Activity structure. I was lucky as I knew all the structure fields because they were set by the Java code via JNI in other places ;-).
  • next is to define the std::vector structure, every single std::vector defines 3 fields:
    T *__begin_;
    T *__end_;
    T *__enc_cap_;
    

    Yes, that’s all a std::vector needs to do all the magic: iterate, insert, push, pop, erase, etc.

  • last but not least use them in our function:
undefined Java_package_class_name_ActivityVect_1add
                    (JNIEnv *env,jclass thiz,vector<> *vec_ptr,jobject a1,Activity *value,jobject a3)
{
  undefined uVar1;
  Activity *end_cap;
  uint64_t uVar2;
  Activity *end;

  if (value == (Activity *)0x0) {
    uVar1 = FUN_0015d3ac((long *)env,7,
                         "std::vector< Activity >::value_type const & reference is null");
    return uVar1;
  }
  end = vec_ptr->__end_;
  end_cap = vec_ptr->__end_cap_;
  if (end != end_cap) {
    if (end != (Activity *)0x0) {
      uVar2 = value->endTime;
      end->startTime = value->startTime;
      end->endTime = uVar2;
      uVar2 = value->bipedalCount;
      end->point = value->point;
      end->bipedalCount = uVar2;
      uVar2 = value->maxVariance;
      end->variance = value->variance;
      end->maxVariance = uVar2;
      uVar2 = value->doubleTapCount;
      end->trippleTapCount = value->trippleTapCount;
      end->doubleTapCount = uVar2;
      end_cap = *(Activity **)&value->tag;
      *(Activity **)&end->tag = end_cap;
    }
    vec_ptr->__end_ = end + 1;
    return (char)end_cap;
  }
  std::vector<>::_M_emplace_back_aux<>(vec_ptr,value);
  return (char)vec_ptr;
}

Okay, so now the code is much cleaner and we figure out exactly what this function does. This means we can write it in a single line of code:

vec.push_back(val);

Of course, you’ll find more, MUCH more complicated cases where you’ll spend a lot of time to figure out what’s going on.

I really hope some day the AI will be intelligent enough to do this job for us. Yes, I’m one of these people that is not afraid to embrace the new technologies :).

Side note, even though ghidra has an excellent C/C++ decompilation, good ASM knowledge will help a lot where ghidra fails to decompile to C/C++.

After I had enough info about BLE protocol, I began to write a Qt application to use it. I found the BLE support in Qt 6.5.1 quite good (at least on android & linux desktop) as I could use quite a few BLE commands painlessly.

The application is still at the beginning and it will require more time, pain and sorrow to get it at the same level of the original application, but it’s a start ;-).

Thank you for your time.

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

The post Reverse Engineering Android Apps appeared first on KDAB.

Object Ownership

Last time we touched upon object lifetime and today we wrap up the basics with a bit of a spicy topic of object ownership. We covered the lifetime quirks, and we found out that manual memory management can be a nightmare, even if we new and delete in the correct order. There must be something better than that. Well, there is but it comes with its own can of worms.

Structured cleanup

Since we know the rules of new and delete, namely new allocates and delete destroys, we never really cared about who is responsible for the object. This caused a lot of confusion in the past. For instance, some API codes from Win32 return strings that should be LocalFree()d, like FormatMessage or GetEnvironmentStrings. POSIX, on the other hand, has strdup as a common example of you should free it yourself. This model is confusing because you may have a lot of return statements, before which you should always call free or delete, depending on the operation. However, we have RAII since the very beginning of C++, which adds constructors and destructors. So, in 1998 resourceful people decided to add auto_ptr to the standard.

The premise was simple:

  • a simple explicit constructor, that took raw pointer from new
  • destroyed by either an explicit release/reset or a destructor on the end of the block

This was the first attempt at a structured cleanup. As time passed, the initial point began to crumble. More issues arose and the question came up:

Who owns the data?

Of course, the data is owned by the pointer class. But what if the data is needed elsewhere? In C++98 the only options were reference and raw pointer. If you copied the auto_ptr to another place it moved itself to that location, essentially transferring the ownership. That made it impossible to place them into standard containers since the copy of the object did not adhere to copy semantics at all, and the objects were not equal.

Simple example:

auto_ptr<CPClass> a(new CPClass);
auto_ptr<CPClass> b(a); //copy the a

printf("%p", a.get()); // prints 0
printf("%p", b.get()); // prints valid pointer

Another set of problems included inability to express custom deletions for allocated arrays, malloc allocation, or any specific destructor at all. Quite problematic one might say. C++11 invented move semantics, and along with them new types of pointers; unique_ptr was one of them. It disabled copying altogether, which forcd containers like vector to move the contents instead. Furthermore, the argument template also got a custom deleter possibility for custom pointers, array overload and fancy make_unique<>. This is a replacement for explicit new call, though, it lacks a custom deleter. For that usage you still have to use an explicit constructor. In the same standard auto_ptr was deprecated and it was removed completely in C++17.

Now for some express wordings: Object 1 owns another Object 2, if 2’s lifetime does not exceed 1’s and object 1 is responsible for destroying object 2. For fellow mathematicians, it is a weak ordering of lifetime.

The object may be destroyed earlier than its holder, but not the other way around. auto_ptr uniquely owns memory, hence, you can say, that it is a unique pointer. Well, kind of. But the implicit ownership transfer did not allow it to be stable enough. The unique_ptr came out and said: “I own the memory! And if you want it, then you will have to take it.”

Although two more pointers came along with unique_ptr, this is where we should dive a bit deeper.

Shared ownership

Let’s imagine the case, where there are several objects that communicate with one particular one. The easiest example from the real world is the communication with a printer from several devices.

If we project the same logic to the code, we could expect an object, which represents the printer inside the objects, which represent devices. Pretty simple, isn’t it? Now we impose a restriction: If the printer is out of scope, it shuts down.

Suddenly, the task becomes complex since we can’t say explicitly who owns the printer, and we need to keep it alive, while every device keeps working with it.

Component Object Model

The solution would require sharing the ownership between the consumers. How can we solve that? Microsoft pondered about this question and invented COM. Of course, I oversimplify because COM actually solves a lot more than simply sharing, but also hiding the implementation details, uniform representations, ABI control, cross-process communications, etc. But the one thing it does with it is the so-called reference counting. It counts how many objects own the COM interface and does the cleanup when all the consumers are out and refcount is 0.

Here the object itself is responsible for deleting itself and not the consumer, and the same function cleans the underlying memory, which is also responsible for releasing a reference called Release. Such a model is a bit confusing at first, but bearg in mind that it was invented even before C++98, in 1993 to be exact, everything comes into place. Does your complex object have some very hard destruction process aside from just delete? Maybe it is part of a memory pool and the memory should just go back to it? Fear not, interface->Release() got it for you. (The class should still implement the Release function; so, no magic here). The model is quite robust and is still in use by Microsoft to this day. Later iterations of Windows API included RAII wrappers, such as ATL CComPtr and CComQIPtr, WRL Microsoft::WRL::ComPtr and finally WinRT with winrt::com_ptr. All for their needs but with one purpose.

Explicit sharing

Well, of course, COM looked like a miracle back in the day. While it was a bit clumsy with implementation, it was doing its job. But what if we are not on Windows? You can still emulate COM and it will in fact work just as well. But implementing it is a nightmare for a regular programmer. C++11 added shared_ptr, which did the same job of sharing the data using atomic reference counting. It did not put the responsibility of destruction to the object, but called destruction himself. Also coming packed with custom deleter and array overloads (which were mostly missing for COM), it came with a particular function, which felt the same as make_unique. make_shared provided a way to construct a simple shared_pointer, but it is storing the reference counter with the object, when constructor allocates two blocks of memory; one for ref counter and one for the object itself.

Curse of sharing

So far, we have discussed only strong points, but what came with shared_ptr was a big problem. COM model enforced strict rules for marshalling and modification of the internal state, as well as concurrent access. Also COM always exported only interfaces, so the state stability was on the developer of the implementation. But shared pointer didn’t do anything in that regard, leaving room for a lot of bugs and exploits, that came along.

What is the problem? As we already know from Value Semantics, sharing a reference is bad, and often leads to entanglement and fragility of the code. There is one particular case, that I have seen in practice:

struct B;
struct A{
...
std::shared_ptr<B> b;
}
struct B{
...
std::shared_ptr<A> a;
}

I can construct either of those, but let’s choose A:

int main(){
std::shared_ptr<A> a = std::make_shared<A>();
std::shared_ptr<B> a = std::make_shared<B>();
a->b = b;
b->a = a;
}

Now, tell me, who owns who? What will happen after the main()?

The memory obviously will leak since we have created a quantum entanglement. This is obvious but the real example may not be, what if A and B are connected through several other classes? Debugging such a leak is borderline impossible! Of course, there is weak_ptr which solves the deal, but it is still hard to twist your mind around that. To fix the problem we need to:

struct B;
struct A{
...
std::shared_ptr<B> b;
}
struct B{
...
std::weak_ptr<A> a;
}
int main(){
std::shared_ptr<A> a = std::make_shared<A>();
std::shared_ptr<B> a = std::make_shared<B>();
a->b = b;
b->a = a;
}

Now everything is going to be deallocated after the main ends.

Conclusion

unique_ptr is a solid tool to ensure an object is only used in one place at a time. Great for value semantics, it does not break anything, robust. shared_ptr, on the other hand, is bad… Just kidding! Although the menace is lurking around, if you are sure, the state of underlying object is immutable, exempli gratia: a pool of shared textures for a game, it is fine. It is a bad design to share state, and it must be done sparingly and only in cases, where it is absolutely necessary! weak_ptr does not share state, and is a great helper to brake strong bonds, although I should mention costs of checking if the allocated object is alive. This may kill the performance – hence, still no magic solution.

Now we have made our way as close to the coroutine world as possible, we are ready to use the knowledge to our advantage and build predictable code. And remember unique – good, shared + immutable also good.

Enjoy advanced object lifetime!

 

Sincerely,

Ilya “Agrael” Doroshenko

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

The post Object Ownership appeared first on KDAB.

KDAB Training Day before Qt World Summit 2023

The KDAB Training Day will be back in Berlin on November 27th this year, right before the annual Qt World Summit, which will happen November 28-29th. KDAB Training Day will be held at the H4 Hotel Berlin Alexanderplatz, just down the road from the bcc Berlin Congress Centre where The Qt World Summit takes place, starting the following day. Ideal to take part in both events!

KDAB is well-known for its quality training courses around Qt/QML, Modern C++, Debugging and Profiling, OpenGL, and other topics relevant to Qt developers. All courses provided by KDAB at the Training Day include central parts of our regular 3- to 4-day courses that are available as scheduled training or customized on-site training.

Choosing a compact, learning-rich one-day course, lets you experience the quality and effectiveness of KDAB’s usual training offerings.

At this year’s KDAB Training Day, you get to choose from the following topics:

  • What’s new in C++23, trained by Giuseppe D’Angelo
  • QML Application Architecture, trained by André Somers
  • Profiling on Linux, trained by Milian Wolff
  • Porting to Qt 6, trained by Nicolas Fella

And for the first time in collaboration with the Rust experts from Ferrous Systems:

  • A taste of Rust (with a drop of Qt), trained by Florian Gilcher

You can find more information about each training by clicking here.

Get your ticket today! We have limited seats available for each training. So you should not wait too long if you want to take part in a specific course. Tickets include access to one training course, training material, lunch buffet, beverages, and coffee break.

Note: there is no combo ticket for the KDAB Training Day and Qt World Summit like in previous years. Please, make sure to book your ticket for Qt World Summit separately here, if you want to attend.

We are looking forward to seeing you in Berlin!

The post KDAB Training Day before Qt World Summit 2023 appeared first on KDAB.

Release 4.1.0: Build Qt 6 Apps for WebAssembly, Integrate Firebase Analytics, Gaming Components and SpeechToText

The Felgo 4.1.0 update allows you to build Qt 6 and Felgo 4 apps for WebAssembly. As a replacement for Google Analytics, which will shut down at the end of June, you can now use the new Firebase Analytics Plugin

In addition, the latest release includes support for Felgo Gaming Components, adds SpeechToText for iOS and Android, and improves the Felgo Live connection process.

Felgo 4.1.0 comes as a free update for all Felgo developers.

Synchronization in Vulkan

An important part of working with Vulkan and other modern explicit rendering APIs is the synchronization of GPU/GPU and CPU/GPU workloads. In this article we will learn about what Vulkan needs us to synchronize and how to achieve it. We will talk about two high-level parts of the synchronization domain that we, as application and library developers, are responsible for:

  1. GPU↔GPU synchronization to ensure that certain GPU operations do not occur out of order,
  2. CPU↔GPU synchronization to ensure that we maintain a certain level of latency and resource usage in our applications.

GPU↔GPU Synchronization

Whereas in OpenGL we could simply render to the GL_BACK buffer of the default framebuffer and then tell the system to swap the back and front buffers, with Vulkan we have to get more involved. Vulkan exposes the concept of a swapchain of images. This is essentially a collection of textures (VkImages) that are owned and managed by the swapchain and the window system integration (WSI). A typical frame in Vulkan looks something like this:

  1. Acquire the index of the swapchain image to which we should render.
  2. Record one or more command buffers that ultimately output to the swapchain image from step 1.
  3. Submit the command buffers from step 2 to a GPU queue for processing.
  4. Instruct the GPU presentation engine to display the final rendered swapchain image from step 3.
  5. Go back to step 1 and start over for the next frame.

This may look innocuous at first glance but let’s delve deeper.

A day at the races

In step 1 we are asking the WSI to tell us the index of the next available swapchain image that we may render into. Now, just because this function tells us (and the CPU) that, for example, image index 1 is the image we should use as our render target, it does not mean that the GPU is actually ready to write to this image right now.

It is important to note that we are operating on two distinct timelines. There is the CPU timeline that we are familiar with when writing applications. Then there is also the GPU timeline on which the GPU processes the work that we give to it (from the CPU timeline).

In the case of acquiring a swapchain image index, we are actually asking the GPU to look into the future a little bit and tell us which image index will become the next image to become ready for writing. However, when we call the function to acquire this image index, the GPU presentation engine may well still be reading from the image in question in order to display its contents from an earlier frame.

Many people coming new to Vulkan (myself included) make the mistake of thinking that acquiring the swapchain image index means the image is ready to go right now. It’s not!

In step 2, we are entirely operating on the CPU timeline and we can safely record command buffers without fear of trampling over anything happening on the GPU.

The same is true in step 3. We can happily submit the command buffers which will render to our swapchain image. However, this does then trigger the problem. If the GPU presentation engine is still busy reading from the swapchain image when suddenly along comes a bundle of work that tells the GPU to render into that same image we have a potential problem. GPUs are thirsty beasts and are massively parallel machines that like to do as much as possible concurrently. Without some form of synchronization, it is clear to see that, if the GPU begins processing the command buffers, it could easily lead to a situation where the presentation engine could be reading data at the same time it is being written to by another GPU thread. Say hello to our old friend undefined behaviour!

Timeline diagram showing overlapping GPU workloads

It is now clear that we need some mechanism to instruct the GPU to not process these command buffers until the GPU presentation engine is done reading from the swapchain image we are rendering to.

The solution for synchronising blocks of GPU work in Vulkan is a semaphore (VkSemaphore).

The way it works is that in our application’s initialisation code, we create a semaphore for the purposes of forcing the command buffer processing to begin only once the GPU presentation engine tells us it is done reading from the swapchain image it told us to use.

With this semaphore in hand, we can tell the GPU to switch it to a “signalled” state when the presentation engine is done reading from the image. The other half of the problem is solved when we submit the render command buffers to the GPU by handing the same semaphore to the call to vkQueueSubmit().

We now have this kind of setup:

  • At initialisation, create a semaphore (vkCreateSemaphore) in the unsignalled state.
  • Pass the above semaphore to vkAcquireNextImageKHR as the semaphore argument so that it is signalled when the image is ready for writing.
  • Pass the above semaphore to vkQueueSubmit (as one of the pWaitSemaphore arguments of the VkSubmitInfo struct) so that this set of command buffers is deferred until the semaphore is signalled.

GPU timeline showing the use of a semaphore to strictly order rendering after presentation.

Phew, we’re all done right? Nope, sadly not. Read on to see what else can go wrong and how to solve it.

I’m not ready to show you my painting

We have solved the race condition on the GPU of preventing the start of the rendering from clobbering the swapchain image whilst the presentation engine may still be reading from it. However, there is currently nothing to prevent the request to begin the presentation of the swapchain image whilst the rendering is still going on!

That is, we have solved the potential race between steps 1 and 3, but there is another race between steps 3 and 4. Luckily the problem is at heart exactly the same. We need to stop some incoming GPU work (the present request in step 4) from stepping on the toes of the already ongoing rendering work from step 3. That is, we need another application of GPU↔GPU synchronization which we know we can do with a semaphore.

To solve this race condition we use the following approach:

  • At initialisation, create another unsignalled semaphore.
  • In step 3 when we submit the command buffers for rendering, we pass in the semaphore to vkQueueSubmit as one of the pSignalSemaphores arguments.
  • In step 4 we then pass this same semaphore to the call to vkQueuePresentKHR as one of the pWaitSemaphores arguments.

This works in a completely analogous way to the first problem that we solved. When we submit the render command buffers for processing, this second semaphore is unsignalled. When the command buffers finish execution, the GPU will transition the semaphore to the signalled state. The call to vkQueuePresentKHR has been configured to ensure the presentation engine waits for this condition to be true before beginning whatever work it needs to do to get that image on to our screen.

GPU timeline showing the use of a semaphore to strictly order presentation after rendering.

With the above two race conditions brought under control, we can now safely loop around the sequence of steps 1-4 as many times as we like.

Well, almost. There is a slight subtlety in that the swapchain has N frames (typically 3 or so) but so far we have only created a single semaphore for the presentation→render ordering, and a second single semaphore for the render→presentation ordering. Usually however, we do not want to render and present a single image and then wait around for the presentation to be done before starting over, as that is a big waste of cycles on both the CPU and GPU sides.

As a side note, many Vulkan examples in tutorials do this by introducing a call to vkDeviceWaitIdle or vkQueueWaitIdle somewhere in their main loop. This is fine for learning Vulkan and its concepts but to get full performance we want to go further into allowing the CPU and the GPU to work concurrently.

One thing that we can do is to create enough semaphores such that we have one each for every frame that we wish to have “in flight” at any time and for each of the 2 required synchronization points. We can then use the i’th pair of semaphores for the i’th in-flight frame and when we get to the N’th in-flight frame we loop back to the 0’th pair of semaphores in a round robin fashion.

This then allows us to get potentially N frames ahead of the GPU on the CPU timeline. This, unfortunately, opens up our next can of worms.

CPU↔GPU Synchronization

So far we have shown that using semaphores when enqueuing work for the GPU allows us to correctly order the work done on the GPU timeline. We have briefly mentioned that this does nothing to keep the CPU in sync with the GPU. As it stands right now the CPU is free to schedule as much work in advance as we like (assuming sufficient available resources). This has a couple of issues though:

  • The more frames of work in advance the CPU schedules work for the GPU, the more resources we need to hold command buffers, semaphores etc. – not to mention the GPU resources to which the command buffers refer, such as buffers and textures. These GPU resources all have to be kept alive as long as any command buffers are referencing them.
  • The second issue is that the further the CPU gets ahead of the GPU the further our simulation state gets ahead of what we see. That means, the more frames ahead we allow the CPU to get, the higher is our latency. Some latency can be good in that if we have a frame or two queued up already, a frame that then takes a bit longer to prepare can be absorbed unnoticed. However, too much latency and our application feels sluggish and unnatural to use as it takes too long for our input to be responded to and for us to see the results of that on screen.

It is therefore essential to have a good handle on our system’s latency which in this case means the number of frames we allow to be “in flight” at any one time. That is, the number of frames worth of command buffers that have been submitted to the GPU queues and are being recorded at the current time. A common choice here is to allow 2-3 frames to be in flight at once. Bear in mind that this also depends upon other factors such as your display’s refresh rate. If you are running on a high refresh rate display at say 240Hz, then each frame is only around for 1/4 of the time of a “standard” 60Hz display. If this is the case, you may wish to increase the number of frames in flight to compensate.

Let’s parameterise the max number of frames that the CPU can get ahead as MAX_FRAMES_IN_FLIGHT. From our discussions in the previous sections we know that if we can keep the CPU from getting ahead by only MAX_FRAMES_IN_FLIGHT frames, then we will only need MAX_FRAMES_IN_FLIGHT semaphores for each use of a semaphore within a frame.

So now the question is how do we stop the CPU from racing ahead of the GPU? Specifically we need a way to make the CPU timeline wait until the GPU timeline indicates that it is done with processing a frame. In Vulkan, the answer to this is a fence (VkFence). Conceptually this is how we can structure a frame with fences to get the desired result (ignoring the use of semaphores for GPU↔GPU synchronization):

  1. In the application initialisation, create MAX_FRAMES_IN_FLIGHT fence objects in the signalled state.
  2. Force the CPU timeline to wait until the fence for this frame becomes signalled or continue immediately if it is the first frame and the fence is already signalled (vkWaitForFences).
  3. Reset the fence to the unsignalled state so that we can wait for it again in the future (vkResetFences).
  4. Acquire the swapchain image index (as before).
  5. Record and submit the command buffers to perform the rendering for this frame. When it is time to submit the command buffers to the GPU queue, we can pass in the fence for this frame as the final argument to vkQueueSubmit. Just as with a semaphore, when the GPU queue finishes processing this command buffer submission, it will transition the fence to the signalled state.
  6. Issue a GPU command to present the completed swapchain image (as before).
  7. Go to step 2 and use the next fence and (set of semaphores).

A timeline diagram showing the use of semaphores to order GPU workloads and a fence to synchronize the CPU and GPU.

With this approach, the CPU timeline can only get at most MAX_FRAMES_IN_FLIGHT ahead of the GPU before the call to vkWaitForFences in step 2 forces it to wait for the corresponding fence to become signalled by the GPU. This is when it completes command buffer submission that went along with this fence.

Making use of both fences and semaphores allows us to nicely keep both the CPU and the GPU timelines making progress without races (between rendering and presentation) and without the CPU running away from us. These two synchronization primitives, fences and semaphores, solve similar but different problems:

  • A VkFence is a synchronization primitive to allow the keeping of the GPU and CPU timelines in pace.
  • A VkSemaphore is a synchronization primitive to ensure ordering of GPU tasks.

It is also worth noting that a VkFence can also be queried as to its state from the CPU timeline rather than having to block until it becomes signalled (vkGetFenceStatus). This allows your application to peek and see if a fence is signalled or not. If it is not yet signalled, your application may be able to make more use of the available time to go do something more productive than just blocking like with vkWaitFences. It all depends upon the design of your application.

Other Considerations

Presentation Mode

We have seen above how we can utilise fences and semaphores to make our Vulkan applications well-behaved. It is also worth mentioning that, as an application author, you should also consider your choice of swapchain presentation mode. This is because this can heavily impact on how your application behaves and how many CPU/GPU cycles it uses. With OpenGL we would typically setup to have either:

VSync enabled rendering for tear-free display OR VSync disabled rendering and go as fast as you can but probably see some image tearing.

With Vulkan we can still get these configurations but there are also others that offer variations. As an example, VK_PRESENT_MODE_MAILBOX_KHR allows us to have tear-free display of the currently presented image (it is vsync enabled), but we can have our application also render as fast as possible. Very briefly, the way this works is that when the presentation engine is displaying swapchain image 0, our calls to vkAcquireNextImageKHR will only return the other swapchain image indices. When we subsequently tell the GPU to present those images it will happily take the image and overwrite your previous presentation submission. When the next vertical blank occurs, the presentation engine will actually show the most up to date submitted swapchain image.

In this manner we can render to e.g. images 1 and 2 as many times as we like so that when the presentation engine moves along, it has the most up to date representation of our application’s state possible.

Depending upon which swapchain presentation mode you request, your application could be locked to the VSync frequency or not, which in turn can lead to large differences in how much of your available CPU and GPU resources are consumed. Are they out for a leisurely stroll (VSync enabled) or sprinting (VSync disabled or mailbox mode)?

Multiple Windows and Swapchains

All of the above examples have assumed we are working with a single window surface and a single swapchain. This is the common case for games but in desktop and embedded applications we may well have multiple windows or multiple screens or even multiple adapters. Vulkan, unlike OpenGL, is pretty flexible when it comes to threading. With some care, we can happily record the command buffers for different windows (swapchains) on different CPU threads. For swapchains sharing a common Vulkan device, we can even request them all to be presented in one function call rather than having to call the equivalent of swapbuffers on each of them sequentially. Once again, Vulkan and the WSI gives you the tools, it’s up to you how you utilise them.

Timeline Semaphores

A more recent addition to Vulkan, known as timeline semaphores, allows applications to use this synchronization primitive to work like a combination of a traditional semaphore and a fence. A timeline semaphore can be used just like a traditional (binary) semaphore to order packets of GPU work correctly, but they may also be waited upon by the CPU timeline (vkWaitSemaphores).  The CPU may also signal a timeline semaphore via a call to vkSignalSemaphore. If found to be supported by your Vulkan version and driver, you can use timeline semaphores to simplify your synchronization mechanisms.

Pipeline and Memory Barriers

This article has only concerned itself with the high-level or coarse synchronization requirements. Depending upon what you are doing inside of your command buffers you will also likely need to take care of synchronising access to various other resources. These include textures and buffers to ensure that different phases of your rendering are not trampling over each other. This is a large topic in itself and is covered in extreme detail by this article and the accompanying examples.

It’s up to you!

A lot of what the OpenGL driver used to manage for us, is now firmly on the plate of application and library developers who wish to make use of Vulkan or other explicit modern graphics APIs. Vulkan provides us with a plethora of tools but it is up to us to decide how to make best use of them and how to map them onto the requirements of our applications. I hope that this article has helped explain some of the considerations of synchronization that you need to keep in mind when you decide to take the next step from the tutorial examples and remove that magic call to vkDeviceWaitIdle.

About KDAB

If you like this article and want to read similar material, consider subscribing via our RSS feed.

Subscribe to KDAB TV for similar informative short video content.

KDAB provides market leading software consulting and development services and training in Qt, C++ and 3D/OpenGL. Contact us.

The post Synchronization in Vulkan appeared first on KDAB.

How to become friends with qmllint

As Qt developers, we produce many lines of QML code every day. Of course, we are all aware of the importance of maintainable, well-organised pieces. We know the best practices, and we try to follow them every time we add new code to the existing base. However, as our project becomes more complex, it can be difficult to keep track of all the best practices and ensure that the code meets the required standards. That's one of the reasons why you should consider qmllint as a candidate for your new friend.

In this article, I want to show you how I work with qmllint and how you can integrate qmllint checks into your daily tasks.

The post How to become friends with qmllint appeared first on Spyrosoft.