[Interest] Fastest Way to convert a QVideoFrame to an QImage?

Wed Feb 12 18:55:09 CET 2014

Oliver: "And yes, writing into a separate 8bit greyscale image instead (with colour lookup - but not sure whether a "colour" lookup is actually needed, since the index [0, 255] already represents the "colour" - am not too familiar with indexed formats) is probably more efficient, because you need to process up to 4 times less data (which might be partly eaten up by the lookup, but check again whether that is actually needed for greyscale indexed images)."

Well Qt does NOT support grayscale. None of the QImage::Formats are gray. It's either mono, indexed color or color bits. Which is why I'm a but unsure of what is faster. I would assume setting a pixel on Indexed 8, would take a RGB or index and produce a color index lookup, even when the index is the lookup. 

Oliver: "But off course it also depends what you need to do later on with that image: if you would need to convert it back to RGBA (in order to "process" it with other RGBA data) it is probably not worth to convert it back and forth."

Really, All I need is 2D 8bpp gray to pass off to the library. So if I use QImage::FormatIndexed8, I could save some space and time copying bytes, but I could waste time setting them. I was hoping someone here would know.

________________________________
From: Till Oliver Knoll <till.oliver.knoll at gmail.com>
To: Qt Project <interest at qt-project.org> 
Sent: Wednesday, February 12, 2014 3:37 AM
Subject: Re: [Interest] Fastest Way to convert a QVideoFrame to an QImage?

Am 11.02.2014 um 23:32 schrieb Jason H <scorp1us at yahoo.com>:

> ...
> 
> 
> constuchar*bits=frame.bits();
> for(inty=0;y<frame.height();y++)
> {
> QRgb*scanLine=(QRgb*)converted.scanLine(y);
> for(intx=0;x<frame.width();x++)
> {
> intc=bits[y*frame.width()+x];
> scanLine[x]=qRgba(c,c,c,255);
> }

That part looks wrong to me at first sight: on one hand you access the pixels via "bits", on the other you access them with "scanLine".

That per se is not wrong, but the problem with "bits" is: you really get the "raw" data block - and the lines might be padded up to the next multiple of N bytes.

In case we talk about RGBA (or any permutation thereof) data, each channel takes 1 Byte and we assume N=4 then it is probably okay to iterate over width and height and access the pixels via (x, y) the way you did.

In all other cases I would use "scanLine" for simplicity, which already takes care of that (possible) "byte padding" at the end of each scan line (the address you get back always points to the proper beginning of each scan line).

That results in at least "height" times more function calls (to scanLine), but is usually worth it to avoid the hassle.

At least the mixture of "bits" (reading pixel) and "scanLine" (writing them back) seems to completely deny the purpose of gaining performance.

Refer again to the QImage docs and look out for "padding".

And yes, writing into a separate 8bit greyscale image instead (with colour lookup - but not sure whether a "colour" lookup is actually needed, since the index [0, 255] already represents the "colour" - am not too familiar with indexed formats) is probably more efficient, because you need to process up to 4 times less data (which might be partly eaten up by the lookup, but check again whether that is actually needed for greyscale indexed images).

But off course it also depends what you need to do later on with that image: if you would need to convert it back to RGBA (in order to "process" it with other RGBA data) it is probably not worth to convert it back and forth.

For even more hardcore performance tuning follow Thiago's advice ;) But do so only once you have a working solution, even if it's the most naive one! ;)

Otherwise, if you are really serious about performance I would suggest to have a look at OpenCL (similar in spirit, but not to be confused with OpenGL ;)).

It provides also "image buffers" and lets you take advantage of all the raw GPU power (or all CPU cores) while still writing your "kernel" (compute) code in a C99 style language.

The bottleneck would probably be the transfer to/from main VRAM/Main RAM, but since that transfer happens asynchronous one could probably use multiple buffers, and while one buffer is uploading and another downloading a kernel could be processing a third image buffer. So the upload/download costs could be amortised, especially if you deal with a stream of images (aka "video") - that's my theory anyway. ;)

OpenCL can also interact with OpenGL buffers, so in case your video is given in an OpenGL buffer you could even save the transfer and calculate everything on the GPU - while still writing your computation in a C99 style "kernel".

Cheers,
  Oliver
_______________________________________________
Interest mailing list
Interest at qt-project.org
http://lists.qt-project.org/mailman/listinfo/interest