[Interest] [External]Re: How to get QtConcurrent to do what I want?

Mon Jan 31 18:08:31 CET 2022

It does not make sense to me to have allocation/population of 60k instances of Tile + QPoint into a QVector taking 5 seconds. It took 5 ms on my computer…

Having said that , if allocation and assignment are your bottleneck , just use placement new operator with a large block of memory or just pure array of 60k structures + QConcurent::map with interators instead of a vector.
You will still need to initialize the array so if that’s the true bottleneck then I guess handle your own threading where you reinitialize pool of threads in a loop passing computed range without the initial storing step.

Sent from my iPhone

> On Jan 31, 2022, at 2:38 AM, Andrei Golubev <andrei.golubev at qt.io> wrote:
> 
> 
> Hello,
> 
> What Scott suggests should help a lot.
> 
> Using a slightly different approach, if you know how many tiles you need, say 60k, isn't it enough to figure out the tile position already?
> Given an index 0 <= i < 60k, you should be able to map it to a 2-dimensional coordinates if you know how many tiles you have horizontally and vertically (actually, just horizontally is enough) - and you do, because the amount of horizontal tiles = image.width() / tile.width().
> So, if I think about it right,  i / (n horizontal tiles) gives you row number, i % (n horizontal tiles) gives you column number.
> 
> Now you can generate a 60k size vector of indices from 0 to 59999 (basically std::iota()) and use that as an input for the tile calculation function through QtConcurrent. Internally, you figure out the tile position in 2d space and from that you get your QPoint + QSize for the square patch (guessing, that i index would basically be a hand-written alternative to blockIdx in CUDA and friends). This now allows you to calculate your tiles vector in parallel as well (without actually using a vector).
> 
> On a side note, doesn't QtConcurrent provide a way to execute a function on a range? Maybe you don't even need to supply a vector of indices, just a half-open interval from 0 to 60k.
> Given that QtConcurrent should handle any sort of callable, you can also supply it a lambda with capture list containing your hyper parameters like tile.width(), tile.height(), and any other external context information.
> 
> --
> Best Regards,
> Andrei
> From: Interest <interest-bounces at qt-project.org> on behalf of Scott Bloom <scott at towel42.com>
> Sent: Monday, January 31, 2022 4:09 AM
> To: Murphy, Sean <Sean.Murphy at centauricorp.com>; Tony Rietwyk <tony at rightsoft.com.au>; interest at qt-project.org <interest at qt-project.org>
> Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?
>  
> Couple things I would try.
>  
> First, preallocate the size of the vector, or use a list if you don’t need random access into it. 
>  
> Second, just send, pos, size into the tile.  Only save the values in the constrctor. When the worker thread kicks off on a tile, then initialize it and do any computation.  This includes the allocation of the submatrix.
>  
> Also, does it need to be a shared pointer?  Its clear who owns the pointers, they aren’t being “shared” as much as use.  For me, I only use shared pointers, when multiple objects can “own” the pointer.  In this case, the tile manager owns the tiles.
>  
> Mainly, do as much as possible to reduce the time in the nested loop.
> 
> Scott
>  
> From: Interest <interest-bounces at qt-project.org> On Behalf Of Murphy, Sean
> Sent: Sunday, January 30, 2022 4:59 PM
> To: Tony Rietwyk <tony at rightsoft.com.au>; interest at qt-project.org
> Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?
>  
> Thanks for the response, but I'm not following your suggestion - or at least I'm not seeing how it's different than what I'm doing? Maybe a little pseudocode will help. Here's what I'm currently doing:
>  
> Tile class:
> private:
>   QPoint mPos;
>   int mSize;
>  
> tile::tile(QPoint pos, int size) :
>     mPos(pos),
>     mSize(size)
> {
>   // assigns this tile an mSize x mSize square
>   // from the original image starting at mPos
>   // pixel location in the original image
> }
>  
> void tile::process()
> {
>   // does the work on the assigned subset
> }
>   
> TileManager:
> private:
>   QVector<QSharedPointer<tile>> mTiles;
>  
> processTile(QSharedPointer<tile>& t)
> {
>   t->process();
> }
>  
> tileManager::setup(QSize tileGrid, int tileSize)
> {
>   // generate each tile with its assignment
>   for(int i=0; i < tileGrid.height(); ++i)
>   {
>     for(int j=0; j < tileGrid.width(); ++j)
>     {
>       // create the new tile while assign its 
>       // region of the original image
>       QSharedPointer<tile> t(new tile(
>                    QPoint(j * tileSize, i * tileSize),
>                    tileSize));
>       mTiles.append(t);
>     }
>   }
>   QtConcurrent::map(mTiles, processTile);
> }
>  
> So I think I'm already doing what you're saying? Where I'm paying the penalty is that the allocation of each tile is happening in one thread and I'd like to see if I can thread out the object creation. But I don't see how to simultaneously thread out the tile objection creation AND correctly assign the tile its location since as far as I can tell, when QtConcurrent executes tileManager's processTile function in parallel there's nothing I can poll inside tileManager::processTile() that allows me to know WHICH step I'm at.
>  
> Or am I misunderstanding what you're saying? 
>  
> The best thing I can come up with is that maybe I could change the type of my mTiles vector to be a QVector<QPoint>> but then I'd still need to loop through nested for-loop to populate all the QPoint items in the vector I want to pass to QtConcurrent::map(). I have tried that yet to see if generating thousands of QPoint objects is faster than generating the same number of tiles, but I can test that out.
>  
> Sean
> From: Interest <interest-bounces at qt-project.org> on behalf of Tony Rietwyk <tony at rightsoft.com.au>
> Sent: Sunday, January 30, 2022 7:19 PM
> To: interest at qt-project.org <interest at qt-project.org>
> Subject: [External]Re: [Interest] How to get QtConcurrent to do what I want?
>  
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>  
> Hi Sean,
>  
> Can you use the position of the tile as a unique key?  Then the manager only needs to calculate each tile's position in the original image.  Each tile extracts the bits, processes and notifies the result with its position.
>  
> Regards, Tony
>  
>  
> On 31/01/2022 10:06 am, Murphy, Sean wrote:
> I'm hitting a design issue with the way I'm using the QtConcurrent module to do some image processing, and I'm wondering if someone can give some pointers?
>  
> At a high level, the software needs to do some processing on every pixel of an image. The processing can mostly be done in parallel, so I've created the following:
> Tile class - responsible for doing the processing on a small subset of the original image
> Has a constructor that takes a Position and Size. From those parameters, the Tile knows what subset of the original image it is going to process
> Has a process() function which will do the work on those assigned pixels
> TileManager class - responsible for managing the Tile objects
> Contains a for-loop that creates each Tile object, assigns it a unique Position, and adds it to the QVector<Tile> vector 
> Has a processTile(Tile& t) function which calls t.process() to tell a given Tile to begin its work
> Calls QtConcurrent::map(tiles, processTile) to process each tile
> So far this works well, but as I was timing different parts of the codebase, I discovered that a large portion of the time is spent allocating the QVector<Tile> vector (step 2a above) before I get to the concurrent processing call. The reason why is obvious to me - I need to ensure that each tile is created with a unique assignment and as far as I can see, that need to happen in a single thread? If I could instead pass off the Tile creation to the parallel processing step, I might be able to improve the overall performance, but I don't see a way around it within the QtConcurrent framework.
>  
> How can I go about creating Tile objects in parallel AND ensure that each of them gets a unique Position assignment? I could easily move the Tile allocation into processTile(), but if I do that, I don't see a way make the unique position assignment since I don't see how a given call to processTile() would know where it is in the overall parallelization sequence to determine what Position to assign to the Tile it creates. If I were using something like CUDA, I could use things like blockIdx and threadIdx to do that, but as far as I can see, those concepts don't exist (or at least aren't exposed) in QtConcurrent.
>  
> Any thoughts?
> 
> 
> _______________________________________________
> Interest mailing list
> Interest at qt-project.org
> https://lists.qt-project.org/listinfo/interest
> _______________________________________________
> Interest mailing list
> Interest at qt-project.org
> https://lists.qt-project.org/listinfo/interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20220131/00e036dd/attachment.htm>