I need to do some image processing on a Pi3, and I am hoping I can use the GPU. Reports say it's got 6 times the floating point power of a single core. I need to basically do the correlation (as opposed to convolution) of two QImages, where one image (the smaller) is slid over the larger to create a score value (just like a sobel filter, but that uses convolution) The output size is QSize(larger.width-smaller.width, larger.height-smaller.height) of ints,  floats or doubles.

It would be really great if Qt (or QImage specifically) supported this op in hardware.

I saw this blog post from 7 years ago: http://blog.qt.io/blog/2010/04/07/using-opencl-with-qt/ with mainly dead links. I found this: 
http://code.qt.io/cgit/qt-labs/qtquickcl.git <- For QtQuick, but I am looking for QCoreApplication based executable.

Does anyone have any additional knowledge about the state of this? I also saw someone ported some parts of OpenCL to the Pi's GPU: https://www.raspberrypi.org/forums/viewtopic.php?t=194952

Additionally, is there any support for the QImage correleation in terms of ARM SMID instructions?

