[Development] Using DMA instead of SHM in non OpenGL apps (Linux/Wayland)

Eduardo Hopperdietzel ehopperdietzel at gmail.com
Thu Aug 24 21:37:05 CEST 2023


Hi David,

I've made a little Wayland app that uses both SHM and DMA, and I tested it
on Weston, Sway, and my own compositor. I also tried it on three different
machines: two with Intel i7 CPUs and one with a smaller ARM CPU. These
machines had Intel Iris Pro, Nvidia GT525M, and Mali-400 GPUs, respectively.

Here's the code and results for one of the machines:

https://github.com/ehopperdietzel/QPainter-SHM-DMA-Benchmark

The results show that there's no significant difference in the time it
takes for read and write operations using QPainter in SHM and DMA maps. It
seems like DMA I/O operations are handled asynchronously by the kernel. The
most noticeable improvement is on the compositor side. When using DMA, the
experience feels much smoother, especially when moving other windows while
the benchmark is running on single-threaded compositors like Weston.
There's also a slight increase in the number of frame callbacks returned by
the compositors when using DMA, though it doesn't significantly boost the
overall FPS.

However, there are challenges with implementing DMA:

1. There does not seems to be standard method to create DMA buffers in
userspace. I tried creating a GBM bo, obtaining a PRIME fd, and mapping it,
but this isn't supported by all GPUs/drivers. For instance, it didn't work
with the Mali GPU using the Lima driver. I also experimented with DMA-BUFF
heaps, but driver support does not seems to be consistent across all
distributions, and accessing /dev/dma-heaps/** often requires superuser
privileges.

2. When using DMA, triple buffering is necessary; otherwise, compositors
only display partial buffer updates. This could potentially be avoided by
using DMA fencing mechanisms (like EGL does under the hood) and protocols
like this one:

https://wayland.app/protocols/linux-explicit-synchronization-unstable-v1

But it seems that not many compositors have implemented it.

To sum it up, while DMA does offer a performance boost, it's not without
its issues:

- DMA's effectiveness varies depending on hardware.
- Implementing DMA can be complex.
- The performance gains might not justify the effort.

So, as you mentioned earlier, it's probably best to stick with SHM and let
the compositor handle uploads using DMA, preferably asynchronously.

Cheers,

Eduardo Hopperdietzel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/development/attachments/20230824/6bb5f6a1/attachment.htm>


More information about the Development mailing list