Monday, January 1, 2018

Qt 5.10 Windows Rendering Benchmarks

At the end of the previous blog post I promised to do a follow-up about Windows side rendering performance. So here we go.

But before that, let's revisit one earlier case. Previously decided to provide results with and without "Bezier lines" test because it seemed to perform particularly poorly when rendered using QML Shape backend. Instead of just letting this one go, I decided to dig a bit deeper to try to improve QML Shape performance. After all, just disabling slow tests doesn't sound like a preferred long-term plan... ;-)

Improving QML Shape paths performance

After some trial and error, found out that QQuickPath::createPath() uses considerable amount of time every time path changes and reason is in QPainterPath::length() which e.g. for all curved paths calls QBezier::length(). This isn't usually problem for PathView as its normal use-case is creating path once and then just moving elements along the path. But new ShapePath on the other hand might be animated and change (re-create) its path multiple times.

Luckily ShapePath wouldn't actually need to count its length there, as it doesn't support PathAttribute or PathPercent properties. So I went ahead and implemented a fast-path for creating path for ShapePath in patch linked into QTBUG-64951. This patch improves performance of all QML Shape paths, most notably for bigger animated paths. For QNanoPainter demo bezier line test, it provided up to ~20x performance boost on Windows PC. See FPS values at top-left corners of without (back windows) and with the patch (forward windows):

Hopefully that patch ends up into Qt in one form or another, but for now to get better results I used Qt 5.10 branch + patch for all the testing here.

Desktop OpenGL vs. Angle

As you probably know, Qt Quick on Windows PC can run either on desktop OpenGL or on OpenGL ES through Angle. What Angle does, is translating OpenGL ES API calls for Direct3D which is great to improve compatibility as OpenGL support varies on Windows. Direct3D drivers of different GPUs might be more optimized than OpenGL but on the other hand translation has some overhead so one might ponder which one is faster, normal OpenGL or Angle? How much does it matter?

To get some view into this, I ran the test application first on trusty old Windows PC with the following related specs:
  • Intel i5-2500K @ 3.3GHz
  • Integrated Intel HD Graphics 3000 + NVIDIA GTX 1060
  • HD monitor
  • Windows 10
  • Building with MS Visual Studio 2015 - 64bit
  • All QNanoPainter perf demo default tests enabled

Test1: Windows PC with HD Graphics 3000, fullscreen:

Test1 conclusions: No huge differences in performances. Likely old integrated Intel GPU is so non-performant that none of the rendering methods can perform well. With QQuickPaintedItem (QImage and FBO), Angle is faster than desktop OpenGL. With all the other backends, OpenGL is faster. But as said, differences aren't very big.

Full HD resolution is probably too much for that integrated GPU to handle, so let's repeat the test with default window size (375x667).

Test2: Windows PC with HD Graphics 3000, default window size:

Test2 conclusions: Interesting part here is naturally comparison with the Test1 results. Immediately we can see QNanoPainter (OpenGL) outperforming other options. Decreasing the item size improved its fps ~200%, while QNanoPainter (Angle) only gained ~40%. So decreasing the pixel amount allows OpenGL to perform better than Angle. In many tests QQuickPaintedItem (QImage) has been the slowest option, but here it's actually second fastest. Combination of fast Intel CPU + poor GPU + small item size suits it well. Not really surprising, but good to prove it.

It is also interesting that item size didn't have big affect for QML Shape nor QQuickPaintedItem(FBO), both of those are limited by something else than item pixel amount. But some results are a bit shady, leading to think that maybe this older integrated GPU is doing strange things.

Next we can enable the external NVIDIA GTX 1060 graphics card and see how huge GPU performance increase affects these different rendering methods. As QML Shape supports NVIDIA-specific GL_NV_path_rendering we need to add into our test matrix one more option, with and without vendorExtensionsEnabled property enabled. To keep things clearer, let's do Angle and OpenGL as separate graphs from now on.

Test3: Windows PC with NVIDIA GTX 1060, Angle:

Test4: Windows PC with NVIDIA GTX 1060, OpenGL:

Tests 3 and 4 conclusions:
  • No difference for QQuickPaintedItem (QImage) between Test 3 & 4. This is as expected, Qt Raster CPU backend performs equally well (or bad) with OpenGL and Angle.
  • With QML Shape (no GL_NV_path_rendering), OpenGL and Angle perform quite close to each other, with OpenGL being ~20% faster.
  • Angle doesn't have GL_NV_path_rendering extension available so on Angle results with and without vendorExtensionsEnabled are exactly same. On OpenGL, GL_NV_path_rendering gives about 20% performance improve over default GeometryRenderer.
  • QQuickPaintedItem (FBO) is clearly faster with OpenGL, about double the speed compared to Angle. QNanoPainter is also much faster with OpenGL, about 4x the speed compared to Angle.
As we are in a good benchmarking flow now let's not stop yet. Next we will switch to fresh laptop hardware, Dell XPS 15 (9560) with the following related specs:
  • Core i7-7700HQ
  • Integrated Intel HD 630 + NVIDIA GTX 1050
  • HD screen
  • Windows 10
  • Building with MS Visual Studio 2015 - 64bit
  • All QNanoPainter perf demo default tests enabled
So how does a laptop with latest Intel CPU + GPU perform? What's the difference between integrated vs. additional GPU here? Let's find out, running all test in fullscreen HD resolution.

Test 5: Dell XPS 15 with Intel HD 630, Angle:

Test 6: Dell XPS 15 with Intel HD 630, OpenGL:

Test 7: Dell XPS 15 with NVIDIA GTX 1050, Angle:

Test 8: Dell XPS 15 with NVIDIA GTX 1050, OpenGL:

Tests 5-8 conclusions:
  • With HD 630, OpenGL is also overall faster than Angle. With QML Shape and QQuickPaintedItem (FBO) OpenGL reaches ~100% higher fps, while with QNanoPainter OpenGL has ~50% higher fps than Angle.
  • Same thing with GTX 1050, OpenGL is faster than Angle. With QML Shape ~30%, QQuickPaintedItem (FBO) ~100% and QNanoPainter ~200% higher fps. It's clear that especially QNanoPainter enjoys taking all juices out of powerful GPUs.
  • Comparing integrated vs. external GPU here, enabling GTX 1050 increases QNanoPainter performance with ~200% (12fps vs. 34fps). Also interestingly QML Shape doesn't gain about anything from external GPU, so bottleneck is somewhere else. But enabling GL_NV_path_rendering for QML Shape with GTX 1050 gives ~20% higher fps which matches to results with other PC. So when running on NVIDIA GPU it's usually preferred to keep vendorExtensionsEnabled on.
  • As expected, Intel integrated GPUs have improved a lot in ~6 years. Looking at QNanoPainter numbers, this laptop with HD 630 performs ~4x faster than HD 3000 of previous setup (43fps vs. 11fps). Yes, other parts have changed too, but GPU is a big factor here.
  • Comparison between old system + NVIDIA GTX 1060 vs. new system + NVIDIA GTX 1050 is also interesting. We can see that although GPU is beefier, CPU, RAM etc. turn the table for the newer system. On new system, rendering is overall ~30% faster (QNanoPainter 34fps vs. 26fps).

Now it's probably good time to call this blog post done. As always, thoughts about these results, own testing results or any other comments are warmly welcome. And happy 2018!

No comments: