Tuesday, March 26, 2019

Qt painting performance with 4 different embedded GPUs (Mali, Adreno, PowerVR)


Summer is approaching, Qt 5.12.2 is out and we wanted to again get concrete painting performance numbers on some lower end embedded SOC GPUs. The kind of chipsets which people are using for embedded devices in multiple places. If user interface contains dynamically painted elements and target is 60fps (or maybe 30fps), how much could you actually paint and which Qt technologies to reach out?

To do this testing in easily approachable way, we grabbed 3 cheaper Android™ tablets with different GPUs for testing. Other option would have been different development boards, but those need a bit more setup time and time-to-market (or time-to-blog.. ;-) ) is important to all of us. We also wanted to give Qt 5.12.2 a go while it's hot.

So let's first introduce the tablets & chipsets used for this testing:

1) Lenovo Tab 7 Essential, MediaTek MT8167D, GPU: IMG PowerVR GE8300. This GPU is exactly the same which e.g. Renesas D3 uses, so if you are into automotives this GPU may be interesting.

2) Lenovo Tab E8, MediaTek MT8163B, GPU: ARM Mali-T720 MP2. This ARM GPU is quite common in lower-end MediaTek chipsets but also used in Allwinner H6 which is foung e.g. on Zidoo H6 and different Orange Pi boards.

3) Huawei MediaPad T3 10, Qualcomm Snapdragon 425, GPU: Adreno 308. This Adreno version is lower end chip from Qualcomm Snapdragon family, very commonly found from more affordable tablets and phones (like Samsung Galaxy J2, Motorola Moto E5, Nokia 2.1 etc.).

As a comparison we'll throw one more device in the set:

4) Nexus 6 (2014), Qualcomm Snapdragon 805, GPU: Adreno 420. This is the wild-card contender here, higher end phone from ~4.5 years ago. So how does a bit dated highend match to current low end? Well GPUs on these low end tablets are also mostly dated, but let's see.

Setup

These tablets have different screen resolutions, so to get more comparable results we first configure them all to use same resolution. Suitable resolution for our imaginary IOT touchscreen device could be 400x640 px. So using adb shell we change resolutions of each device with:

adb shell wm size 400x640

Now we want to know how these chipsets perform compared to each others. But we also want to know difference between CPU side QPainter drawing (Image rendertarget) vs. GPU side QPainter (Framebufferobject rendertarget) vs. GPU side QNanoPainter. So different rendering backends we will test are:
  • QPainter - CPU - antialiased
  • QPainter - CPU - non-antialiased
  • QPainter - GPU - non-antialiased
  • QNanoPainter - GPU - antialiased
  • QNanoPainter - GPU - non-antialiased
As you see, above list is missing antialiased GPU QPainter. The reason is that these devices don't support OpenGL extensions Qt requires for MSAA antialiasing so that combination is not available.

Testing

There are plenty of different testing possibilities and combinations we could try here, but we want to be quite general instead of going into specific detailed operations. Our first test is "How much stuff can you draw on a fullscreen item?" and our second test "How many smaller and less demanding items can you manage?". So let's start.

TEST 1: QNanoPainter vs. QPainter demo, all default test enabled (ruler, circles, lines, bars, icons), running fullscreen (remember all tablets are set to use 640x400 resolution). So quite heavy and versatile painting already by default. Then we increase how many times all tests are rendered and watch framerate dropping towards floor... Here's video of all devices running this test with QNanoPainter and single render count:



Results are following:






TEST1 Conclusions:
  • Performance of MediaTek MT8167D (PowerVR GE8300) and MediaTek MT8163B (Mali-T720 MP2) are very similar. Seems like first one has slightly faster GPU while second has slightly faster CPU.
  • Adreno 308 doesn't take as big overhead from QNanoPainter antialiasing as other two. While with others antialiased performance is ~50% of non-antialiased one, with Adreno 308 it is ~65%.
  • With MediaTek chipsets QPainter with FBO rendertarget achieves ~50% higher fps than QPainter with Image rendertarget.
  • If your UI requires repainting items of whole (640x400px) screen and you target 60fps, with these chipsets you should look towards QNanoPainter.
  • The comparison device (Nexus 6) Adreno 420 GPU is notably beefier and with QNanoPainter you can render all tests 4 times while keeping steady 60fps. But interestingly QQuickPaintedItem FBO rendertarget doesn't get much out of this GPU. What is causing this could be analyzed further.


TEST 2: QNanoPainter vs. QPainter demo, only circles test enabled, smaller 256x256px item size. Also, instead on increasing rendering count, we increase the amount of these QQuickItems. So output looks like this with 1, 2, 4, 8 and 16 items:


For this test we also add 6th rendering mode into test: QNanoPainter with QNANO_USE_RENDERNODE defined. With this, QSGRenderNode is used which basically means rendering directly into Qt Quick Scene Graph instead of rendering through FBO using QQuickFramebufferObject. When the amount of items increases, potential savings for not rendering through FBO also increases but we want to know how much and does it depend on GPUs.

Results are following:






TEST2 Conclusions:
  • Reducing size of items and the amount of painting makes these chipsets more viable option for dynamically painted UI elements.
  • If your UI contains 2 items like this, Snapdragon 425 can manage 60fps using QPainter - CPU. But CPUs on MediaTek chipsets can't quite reach that (44fps & 50fps). Using QPainter - GPU those all reach 60fps with 2 items.
  • With these simpler items, QNanoPainter antialiasing doesn't have major overhead on any of the chipsets. QPainter (CPU) antialiasing does obviously have notable overhead.
  • Using QNANO_USE_RENDERNODE (QSGRenderNode) with QNanoPainter gives notable performance increase in this case, ~20-30% depending on chipset. Our assumption about FBO overhead with more items was correct.
  • Comparison device (Nexus 6) can render at least 16 items with QNanoPainter at fluid 60fps, both antialiased and non-antialiased.

As a final conclusion I would say that if you are working on embedded system with these or similar chipsets and your user interface contains elements which require dynamic painting, consider utilizing QNanoPainter for those.

No comments: