Sunday, March 17, 2019

Using QNanoPainter without QtQuick (pure C++)

Originally, QNanoPainter library was implemented to fulfill the needs of easy to use but performant custom OpenGL QQuickItems. For the needs, Qt Quick Scene graph QSG* classes felt a bit too low-level to be productive, while QQuickPaintedItem was slightly lacking in performance and rendering quality on mobile hardware. For more details, please read this QNanoPainter introduction blog post.

I like (OK, love) Qt Quick & QML and have used them successfully in many different projects. On desktop, mobile and embedded software. A lot. But Qt Quick doesn't suit all situations or it is not always required. As explained in this Qt blog post, Qt 5.12 improves Qt Quick performance and memory usage. But naturally there still is some memory and startup time additions coming from Qt Quick engine.

Fear not, QNanoPainter can be used also without Qt Quick. Available entry points are:
  1. QNanoQuickItem & QNanoQuickItemPainter - This is where it all started, use these to implement your QQuickItems.
  2. QNanoWidget - Based on QOpenGLWidget so can be used for widget based applications. Used similarly to QWidget and just contains QNanoPainter API for painting instead of QPainter. As QNanoPainter is OpenGL (ES) powered, in some cases this can substitute also QGLWidget based components.
  3. QNanoWindow - Based on QOpenGLWindow / QWindow so very lightweight. Optimal for embedded software which would only need a single QNanoWindow for the whole UI.
There are separate helloworld examples for all of these classes in QNanoPainter sources, so to educate ourselves let's see what the memory consumption differences of them are. First I unified all examples to look the same, like this:

Also made applications to exit automatically with timer after running for 2 seconds, to let the memory consumption stabilize. Measuring was done using MTuner memory profiler for Windows. Using freshly released 5.12.2, MTuner memory usage graphs look like this:

So QNanoQuickItem based version is using most memory, peaking at 28.9MB. QNanoWidget comes next with 23.3MB peak usage. And slimmest, as expected, is QNanoWindow app with 19.0MB.

Note that all of these use normal MSVC2015 Qt 5.12.2 from installer and release builds of applications, without extra compiler options or anything. With those and Qt Lite it would be possible to build more streamlined versions especially for QNanoWindow which doesn't depend on other Qt modules than Qt Core & GUI. Further optimizations and testing is left as an exercise for readers and the ones needing it :)

In conclusion: If you are working on embedded device with OpenGL ES 2 / 3 capable GPU, concerned of flash & RAM usage and require relatively simple user interface, I would encourage to check out QNanoWindow. You get hardware accelerated nicely antialiased graphics, in pure C++.

Monday, January 1, 2018

Qt 5.10 Windows Rendering Benchmarks

At the end of the previous blog post I promised to do a follow-up about Windows side rendering performance. So here we go.

But before that, let's revisit one earlier case. Previously decided to provide results with and without "Bezier lines" test because it seemed to perform particularly poorly when rendered using QML Shape backend. Instead of just letting this one go, I decided to dig a bit deeper to try to improve QML Shape performance. After all, just disabling slow tests doesn't sound like a preferred long-term plan... ;-)

Improving QML Shape paths performance

After some trial and error, found out that QQuickPath::createPath() uses considerable amount of time every time path changes and reason is in QPainterPath::length() which e.g. for all curved paths calls QBezier::length(). This isn't usually problem for PathView as its normal use-case is creating path once and then just moving elements along the path. But new ShapePath on the other hand might be animated and change (re-create) its path multiple times.

Luckily ShapePath wouldn't actually need to count its length there, as it doesn't support PathAttribute or PathPercent properties. So I went ahead and implemented a fast-path for creating path for ShapePath in patch linked into QTBUG-64951. This patch improves performance of all QML Shape paths, most notably for bigger animated paths. For QNanoPainter demo bezier line test, it provided up to ~20x performance boost on Windows PC. See FPS values at top-left corners of without (back windows) and with the patch (forward windows):

Hopefully that patch ends up into Qt in one form or another, but for now to get better results I used Qt 5.10 branch + patch for all the testing here.

Desktop OpenGL vs. Angle

As you probably know, Qt Quick on Windows PC can run either on desktop OpenGL or on OpenGL ES through Angle. What Angle does, is translating OpenGL ES API calls for Direct3D which is great to improve compatibility as OpenGL support varies on Windows. Direct3D drivers of different GPUs might be more optimized than OpenGL but on the other hand translation has some overhead so one might ponder which one is faster, normal OpenGL or Angle? How much does it matter?

To get some view into this, I ran the test application first on trusty old Windows PC with the following related specs:
  • Intel i5-2500K @ 3.3GHz
  • Integrated Intel HD Graphics 3000 + NVIDIA GTX 1060
  • HD monitor
  • Windows 10
  • Building with MS Visual Studio 2015 - 64bit
  • All QNanoPainter perf demo default tests enabled

Test1: Windows PC with HD Graphics 3000, fullscreen:

Test1 conclusions: No huge differences in performances. Likely old integrated Intel GPU is so non-performant that none of the rendering methods can perform well. With QQuickPaintedItem (QImage and FBO), Angle is faster than desktop OpenGL. With all the other backends, OpenGL is faster. But as said, differences aren't very big.

Full HD resolution is probably too much for that integrated GPU to handle, so let's repeat the test with default window size (375x667).

Test2: Windows PC with HD Graphics 3000, default window size:

Test2 conclusions: Interesting part here is naturally comparison with the Test1 results. Immediately we can see QNanoPainter (OpenGL) outperforming other options. Decreasing the item size improved its fps ~200%, while QNanoPainter (Angle) only gained ~40%. So decreasing the pixel amount allows OpenGL to perform better than Angle. In many tests QQuickPaintedItem (QImage) has been the slowest option, but here it's actually second fastest. Combination of fast Intel CPU + poor GPU + small item size suits it well. Not really surprising, but good to prove it.

It is also interesting that item size didn't have big affect for QML Shape nor QQuickPaintedItem(FBO), both of those are limited by something else than item pixel amount. But some results are a bit shady, leading to think that maybe this older integrated GPU is doing strange things.

Next we can enable the external NVIDIA GTX 1060 graphics card and see how huge GPU performance increase affects these different rendering methods. As QML Shape supports NVIDIA-specific GL_NV_path_rendering we need to add into our test matrix one more option, with and without vendorExtensionsEnabled property enabled. To keep things clearer, let's do Angle and OpenGL as separate graphs from now on.

Test3: Windows PC with NVIDIA GTX 1060, Angle:

Test4: Windows PC with NVIDIA GTX 1060, OpenGL:

Tests 3 and 4 conclusions:
  • No difference for QQuickPaintedItem (QImage) between Test 3 & 4. This is as expected, Qt Raster CPU backend performs equally well (or bad) with OpenGL and Angle.
  • With QML Shape (no GL_NV_path_rendering), OpenGL and Angle perform quite close to each other, with OpenGL being ~20% faster.
  • Angle doesn't have GL_NV_path_rendering extension available so on Angle results with and without vendorExtensionsEnabled are exactly same. On OpenGL, GL_NV_path_rendering gives about 20% performance improve over default GeometryRenderer.
  • QQuickPaintedItem (FBO) is clearly faster with OpenGL, about double the speed compared to Angle. QNanoPainter is also much faster with OpenGL, about 4x the speed compared to Angle.
As we are in a good benchmarking flow now let's not stop yet. Next we will switch to fresh laptop hardware, Dell XPS 15 (9560) with the following related specs:
  • Core i7-7700HQ
  • Integrated Intel HD 630 + NVIDIA GTX 1050
  • HD screen
  • Windows 10
  • Building with MS Visual Studio 2015 - 64bit
  • All QNanoPainter perf demo default tests enabled
So how does a laptop with latest Intel CPU + GPU perform? What's the difference between integrated vs. additional GPU here? Let's find out, running all test in fullscreen HD resolution.

Test 5: Dell XPS 15 with Intel HD 630, Angle:

Test 6: Dell XPS 15 with Intel HD 630, OpenGL:

Test 7: Dell XPS 15 with NVIDIA GTX 1050, Angle:

Test 8: Dell XPS 15 with NVIDIA GTX 1050, OpenGL:

Tests 5-8 conclusions:
  • With HD 630, OpenGL is also overall faster than Angle. With QML Shape and QQuickPaintedItem (FBO) OpenGL reaches ~100% higher fps, while with QNanoPainter OpenGL has ~50% higher fps than Angle.
  • Same thing with GTX 1050, OpenGL is faster than Angle. With QML Shape ~30%, QQuickPaintedItem (FBO) ~100% and QNanoPainter ~200% higher fps. It's clear that especially QNanoPainter enjoys taking all juices out of powerful GPUs.
  • Comparing integrated vs. external GPU here, enabling GTX 1050 increases QNanoPainter performance with ~200% (12fps vs. 34fps). Also interestingly QML Shape doesn't gain about anything from external GPU, so bottleneck is somewhere else. But enabling GL_NV_path_rendering for QML Shape with GTX 1050 gives ~20% higher fps which matches to results with other PC. So when running on NVIDIA GPU it's usually preferred to keep vendorExtensionsEnabled on.
  • As expected, Intel integrated GPUs have improved a lot in ~6 years. Looking at QNanoPainter numbers, this laptop with HD 630 performs ~4x faster than HD 3000 of previous setup (43fps vs. 11fps). Yes, other parts have changed too, but GPU is a big factor here.
  • Comparison between old system + NVIDIA GTX 1060 vs. new system + NVIDIA GTX 1050 is also interesting. We can see that although GPU is beefier, CPU, RAM etc. turn the table for the newer system. On new system, rendering is overall ~30% faster (QNanoPainter 34fps vs. 26fps).

Now it's probably good time to call this blog post done. As always, thoughts about these results, own testing results or any other comments are warmly welcome. And happy 2018!

Monday, December 4, 2017

Qt 5.10 Rendering Benchmarks

Qt 5.10.0 RC packages are available now and actual release is happening pretty soon. So this seems to be a good time to run some rendering benchmarks with 5.10, including new QML Shape element, QQuickPaintedItem and QNanoPainter.

After my previous blog post, some initial comments mentioned how QML Shape didn't reach their performance expectations. But I think that might be more of a "use the right tool for the job" -kind of thing. This demo application is very much designed to test the limits of how much heavily animated graphics can be drawn while keeping performance high and while having its own strengths, QML Shape likely isn't the tool for that.

To prove this point, there is a new 'flower' test case in QNanoPainter demo app which renders a nice flower path, animating gradient color & rotation (but not path). Combining it with new setting to render multiple items (not just multiple renders per item) and the outcome looks like this with 1 and 16 items:

Now when we know what the desired outcome looks like let's start testing with the first run. 

Test1: Nexus 6, 'Render flower' test:

Test1 conclusions: In this test QQuickPaintedItem (QImage backend) has clearly worst performance, CPU Raster paint engine and uploading into GPU is very non-optimal on Nexus 6. QML Shape performs the best, maintaining fluid 60fps still with 16 individual items. QNanoPainter manages quite well also and switching for QSGRenderNode backend instead of QQuickFramebufferObject to avoid rendering going through FBO gives a nice boost. When the amount of items increases this FBO overhead naturally also increases. QQuickPaintedItem with FBO backend is somewhat slower than QNanoPainter.

This test is kind of best-case-scenario for QML Shape. If path would animate that would be costly for QML Shape backend. Also for example enabling antialiasing turns tables, making QML Shape only render 2 items at 35fps while QNanoPainter manages fluid antialiased 60fps. But that's the thing, select the proper tool for your use case.

Next we can test more complex rendering where also paths animate and see how antialiasing affects the performance. In rest of the tests, instead of increasing item count we increase rendering count, meaning how many times stuff is rendered into a single QQuickItem. The default tests set contains ruler, circles, bezier lines, bars, and icons+text tests. With 1, 2 and 16 rendering counts it looks like this:

So let's continue to Test2: Nexus 6, all default tests enabled:

Test2 conclusions: Slowest performer is again QQuickPaintedItem (QImage). QML Shape becomes right after it, dropping quite a bit from lead position of Test1. Digging QML Shape performance a bit deeper and enabling different tests individually one can see that Bezier lines test makes the biggest fps hit. And disabling some code there revealed that biggest slowdown came from graph dots which were drawn with two PathArc, so improved fps by switching implementation to use QML Rectangle instead. QNanoPainter is fastest but even it only reaches 60fps with non antialiased single rendering. Note that QNanoPainter with QSGRenderNode is missing here and in all rest of the tests because when rendering only single item performance of it is almost the same as QNanoPainter with FBO.

Then we could switch to a bit more powerful hardware and repeat above test with that. 

Test3: Macbook Pro (Mid 2015, AMD R9 M370X), all default tests enabled:

Test3 conclusions: Macbook can clearly handle much more rendering than Nexus 6. As MSAA is fully supported here we are able to test both antialiased and non-antialiased for every rendering method. On macbook MSAA antialiasing is quite cheap which can be seen from QML Shape and QQuickPaintedItem reaching pretty similar frame rates with and without antialiasing. Slowest performer is antialiased QQuickPaintedItem (QImage) while QNanoPainter leading again, reaching solid 60fps with 16 render counts.

As we saw already earlier that Bezier lines test seemed particularly unsuitable for QML Shape, let's next repeat the above test except disabling that single test. After all we try to be fair here and avoid misinterpretations. 

Test4: Macbook Pro, all default tests except Bezier lines enabled:

Test4 conclusions: Most interesting data here comes from comparison to Test3 results. QQuickPaintedItem (QImage) results go up only few percentages, so bezier line test doesn't seem to influence much there. QQuickPaintedItem (FBO) results are now identical for antialiased and non antialiased so light blue line can't be seen under orange one. But not much changes in there either. QNanoPainter improves 30-50% reaching solid 60fps now with 32 render counts when antialiasing is disabled. And finally, QML Shape improves frame rates by whopping ~100% so we were right in this particular test being its Achilles' heel.

We are just scratching surface here. There would be plenty of things to test still and get deeper into individual tests. But for this blog post let's stop here.

General tips about about Qt 5.10 QML Shape usage could be:
  • Use QML Shape for simple shape items as part of QML UIs. Consider other options for more complex shapes which animate also the path. 
  • Also don't use non-trivial Shape elements in places where creation time matters e.g. ListView delegates or making multiple shapes inside Repeater, as parsing the QML into renderable nodes tree has some overhead.
  • When the need is to render rectangles, straight lines or circles, QML Rectangle element gives generally better performance than QML Shape counterpart. You can experiment with this enabling alternative code paths for RulerComponent and LinesComponent of the demo. 
  • If you target mostly hardware with NVIDIA GPU, GL_NV_path_rendering backend of QML Shape should be more performant. I didn't have suitable NVIDIA hardware available currently for testing so these results will have to wait, anyone else want to provide comparisons?

Follow up post is planned for comparing Windows side OpenGL vs. OpenGL ES + Angle rendering performances so stay tuned!

Thursday, November 9, 2017

Qt 5.10 QML Shape testing

When implementing component into QtQuick UI which needs something more than rectangles, images and texts, pure declarative QML hasn't been enough. Popular choices to use for items with some sort of vector drawing are QML Canvas, QQuickPaintedItem or QNanoPainter.

But with Qt 5.10 there will be supports for new Shape element with paths that contain lines, quads, arcs etc. so I decided to install Qt 5.10 beta3 and implement all tests of "qnanopainter_vs_qpainter_demo" with also QML + Shape elements. (This kinda makes it "qnanopainter_vs_qpainter_vs_qmlshape_demo" but not renaming now). So here is in all glory the same UI implemented with QNanoPainter (left), QQuickPaintedItem (center), and QML+Shape (right):

Hard to spot the differences right? If only there would be a way to prove this, some way to x-ray into these UIs... like QSG_VISUALIZE=overdraw to visualize what Qt Quick Scene Graph Renderer sees?

Here you can see that scene graph sees QNanoPainter and QQuickPaintedItem as just big unknown rectangles, while QML+Shape it sees into as that is composed of native scene graph nodes. But proof is in the pudding as they say, what looks the same doesn't perform the same. Here's a video showing all 3 running with two different Android devices:

As different rendering components can be enabled/disabled and settings changed, this demo is quite nice for doing performance comparisons of both exact drawing methods or combining all methods. But those will have to wait for another blog post and for non-beta Qt 5.10 to get fair results. In the mean time, feel free to pull latest sources from github, test yourself and provide patches or comments!

Wednesday, October 25, 2017

FitGraph NG UI prototype

About a month ago I started exercising more, mostly jogging, weights and soccer (with kids). Target is to be in superb shape when 2018 starts, and I'm already feeling stronger & more energetic during the day so looking good!

Anyway, this blog post is somewhat related to that. There's plenty of health-related apps and gadgets available these days and in the past I used some time pondering what would be a perfect activity tracking app for my needs. Now I decided to revive this earlier concept as 'FitGraph NG' while porting it to use QNanoPainter and polishing some parts.

As usual, let's start with a video demonstrating the actual application:

There would of course be more views available, this being just the 'activity timeline' part, but it would already cover many of my initial wishes:
  • Showing the whole day as a graph, data or textually depending on needs.
  • Automatic annotation of activities, type, duration and related activity data. And importantly, being able to select each activity to cover only data during that.
  • See how well you have reached your 'moves' goal which would come from all your activities.
  • Also collect other notes, goals, concerns etc. during the day.
I could write quite a long presentation about this, explain why things are where they are, how interactions are thought out, pinpoint small (but important!) details etc. But don't want to, you can watch the video few times and ponder about those yourself if you wish.

Some more information about the implementation side:
  • Implemented with Qt, so cross-platform on Android, iOS etc. Application logic C++ and UI naturally QML with few shaders.
  • Graphs are painted with single QNanoPainter item for efficiency. Graph animations are driven from QML side for easy tuning.
  • Data is managed with SQLite and fetched into QAbstractListModel. There's configurable QCache to reduce SQL queries. Data in this prototype is generated dummy, but basically allows "unlimited" scrolling of days.
  • Performance was important target, some tricks and optimizations were required to get application working fluidly at 60fps also on lower end Android devices. 
Thoughts welcome and thanks for reading!

Monday, October 23, 2017

Unity testing

Couple weeks ago I decided to study a bit about Unity as I haven't worked with it before. So implemented a simple "Rock Rolling in Terrain" game prototype as a case study, looking like this:

By coincidence Marko also just made a nice blog post related to Unity, and thanks to standard assets his terrain even looks quite similar to mine so not going to repeat similar notes. But all in all my initial feeling is that Unity seems quite productive environment and wouldn't mind implementing some bigger project with it to get deeper.

Sunday, March 12, 2017

QNanoPainter with Qt 5.8 (and QSGRenderNode)

QNanoPainter recently gained initial support for QSGRenderNode which is a new public class available starting from Qt 5.8. What this means is that instead of rendering through FBO using QQuickFramebufferObject so, OpenGL drawing is done directly into Qt Quick Scene Graph. And as a QQuickItem somewhere in the middle of scene, not just underlay/overlay for the scene which was already possible before Qt 5.8 using beforeRendering & afterRendering.

Below is a video running QNanoPainter tester app on MacBook Pro with 16 unique QQuickItems using QSGRenderNode mode:

So should you enable QNANO_USE_RENDERNODE with your custom QNanoPainter items?

  • There is a potential performance gain for not rendering through FBO. Especially if your UI contains many custom items and/or you resize items, QSGRenderNode may give more gains as FBO resizing can be costly.
  • However, based on my testing with few different Android devices the performance difference seems pretty small, just few percentages. Maybe with some less performant embedded platforms which are bad with FBOs there is a bigger difference.
  • With QQuickFramebufferObject, rendering is always clipped to FBO size, so item clip true/false property doesn’t have any effect. With QSGRenderNode such clipping doesn’t automatically happen, instead each item can freely paint anywhere outside its rect. Whether this is pros or cons is up to your use case, but good to note anyway.
  • With QSGRenderNode, standard QQuickItem features (position, rotation, clipping, scaling etc.) need to be implemented customly. QNanoPainter doesn’t (yet) fully support clipping of QML item tree so if you have several clip regions and/or rotate these items, clipping doesn’t necessarily behave as expected when using QNANO_USE_RENDERNODE.
  • When your item doesn’t need to be repainted, just re-rendered, it’s more performant to render the FBO. So if your items don’t change often it might be better not to enable QNANO_USE_RENDERNODE.

So with all above said, QNANO_USE_RENDERNODE is not currently enabled by default even when building with Qt 5.8. But that might change somewhere in future if gains seem worth it. For now please upgrade your QNanoPainter library and test how it works for you.