[Interest] [External]Re: How to get QtConcurrent to do what I want?

Andrei Golubev andrei.golubev at qt.io
Mon Jan 31 09:38:09 CET 2022


Hello,

What Scott suggests should help a lot.

Using a slightly different approach, if you know how many tiles you need, say 60k, isn't it enough to figure out the tile position already?
Given an index 0 <= i < 60k, you should be able to map it to a 2-dimensional coordinates if you know how many tiles you have horizontally and vertically (actually, just horizontally is enough) - and you do, because the amount of horizontal tiles = image.width() / tile.width().
So, if I think about it right,  i / (n horizontal tiles) gives you row number, i % (n horizontal tiles) gives you column number.

Now you can generate a 60k size vector of indices from 0 to 59999 (basically std::iota()) and use that as an input for the tile calculation function through QtConcurrent. Internally, you figure out the tile position in 2d space and from that you get your QPoint + QSize for the square patch (guessing, that i​ index would basically be a hand-written alternative to blockIdx in CUDA and friends). This now allows you to calculate your tiles vector in parallel as well (without actually using a vector).

On a side note, doesn't QtConcurrent provide a way to execute a function on a range? Maybe you don't even need to supply a vector of indices, just a half-open interval from 0 to 60k.
Given that QtConcurrent should handle any sort of callable, you can also supply it a lambda with capture list containing your hyper parameters like tile.width(), tile.height(), and any other external context information.

--
Best Regards,
Andrei
________________________________
From: Interest <interest-bounces at qt-project.org> on behalf of Scott Bloom <scott at towel42.com>
Sent: Monday, January 31, 2022 4:09 AM
To: Murphy, Sean <Sean.Murphy at centauricorp.com>; Tony Rietwyk <tony at rightsoft.com.au>; interest at qt-project.org <interest at qt-project.org>
Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?


Couple things I would try.



First, preallocate the size of the vector, or use a list if you don’t need random access into it.



Second, just send, pos, size into the tile.  Only save the values in the constrctor. When the worker thread kicks off on a tile, then initialize it and do any computation.  This includes the allocation of the submatrix.



Also, does it need to be a shared pointer?  Its clear who owns the pointers, they aren’t being “shared” as much as use.  For me, I only use shared pointers, when multiple objects can “own” the pointer.  In this case, the tile manager owns the tiles.



Mainly, do as much as possible to reduce the time in the nested loop.

Scott



From: Interest <interest-bounces at qt-project.org> On Behalf Of Murphy, Sean
Sent: Sunday, January 30, 2022 4:59 PM
To: Tony Rietwyk <tony at rightsoft.com.au>; interest at qt-project.org
Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?



Thanks for the response, but I'm not following your suggestion - or at least I'm not seeing how it's different than what I'm doing? Maybe a little pseudocode will help. Here's what I'm currently doing:



Tile class:

private:

  QPoint mPos;

  int mSize;



tile::tile(QPoint pos, int size) :

    mPos(pos),

    mSize(size)

{

  // assigns this tile an mSize x mSize square

  // from the original image starting at mPos

  // pixel location in the original image

}



void tile::process()

{

  // does the work on the assigned subset

}



TileManager:

private:

  QVector<QSharedPointer<tile>> mTiles;



processTile(QSharedPointer<tile>& t)

{

  t->process();

}



tileManager::setup(QSize tileGrid, int tileSize)

{

  // generate each tile with its assignment

  for(int i=0; i < tileGrid.height(); ++i)

  {

    for(int j=0; j < tileGrid.width(); ++j)

    {

      // create the new tile while assign its

      // region of the original image

      QSharedPointer<tile> t(new tile(

                   QPoint(j * tileSize, i * tileSize),

                   tileSize));

      mTiles.append(t);

    }

  }

  QtConcurrent::map(mTiles, processTile);

}



So I think I'm already doing what you're saying? Where I'm paying the penalty is that the allocation of each tile is happening in one thread and I'd like to see if I can thread out the object creation. But I don't see how to simultaneously thread out the tile objection creation AND correctly assign the tile its location since as far as I can tell, when QtConcurrent executes tileManager's processTile function in parallel there's nothing I can poll inside tileManager::processTile() that allows me to know WHICH step I'm at.



Or am I misunderstanding what you're saying?



The best thing I can come up with is that maybe I could change the type of my mTiles vector to be a QVector<QPoint>> but then I'd still need to loop through nested for-loop to populate all the QPoint items in the vector I want to pass to QtConcurrent::map(). I have tried that yet to see if generating thousands of QPoint objects is faster than generating the same number of tiles, but I can test that out.



Sean

________________________________

From: Interest <interest-bounces at qt-project.org<mailto:interest-bounces at qt-project.org>> on behalf of Tony Rietwyk <tony at rightsoft.com.au<mailto:tony at rightsoft.com.au>>
Sent: Sunday, January 30, 2022 7:19 PM
To: interest at qt-project.org<mailto:interest at qt-project.org> <interest at qt-project.org<mailto:interest at qt-project.org>>
Subject: [External]Re: [Interest] How to get QtConcurrent to do what I want?



CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.



Hi Sean,



Can you use the position of the tile as a unique key?  Then the manager only needs to calculate each tile's position in the original image.  Each tile extracts the bits, processes and notifies the result with its position.



Regards, Tony





On 31/01/2022 10:06 am, Murphy, Sean wrote:

I'm hitting a design issue with the way I'm using the QtConcurrent module to do some image processing, and I'm wondering if someone can give some pointers?



At a high level, the software needs to do some processing on every pixel of an image. The processing can mostly be done in parallel, so I've created the following:

  1.  Tile class - responsible for doing the processing on a small subset of the original image

     *   Has a constructor that takes a Position and Size. From those parameters, the Tile knows what subset of the original image it is going to process
     *   Has a process() function which will do the work on those assigned pixels

  1.  TileManager class - responsible for managing the Tile objects

     *   Contains a for-loop that creates each Tile object, assigns it a unique Position, and adds it to the QVector<Tile> vector
     *   Has a processTile(Tile& t) function which calls t.process() to tell a given Tile to begin its work
     *   Calls QtConcurrent::map(tiles, processTile) to process each tile

So far this works well, but as I was timing different parts of the codebase, I discovered that a large portion of the time is spent allocating the QVector<Tile> vector (step 2a above) before I get to the concurrent processing call. The reason why is obvious to me - I need to ensure that each tile is created with a unique assignment and as far as I can see, that need to happen in a single thread? If I could instead pass off the Tile creation to the parallel processing step, I might be able to improve the overall performance, but I don't see a way around it within the QtConcurrent framework.



How can I go about creating Tile objects in parallel AND ensure that each of them gets a unique Position assignment? I could easily move the Tile allocation into processTile(), but if I do that, I don't see a way make the unique position assignment since I don't see how a given call to processTile() would know where it is in the overall parallelization sequence to determine what Position to assign to the Tile it creates. If I were using something like CUDA, I could use things like blockIdx and threadIdx to do that, but as far as I can see, those concepts don't exist (or at least aren't exposed) in QtConcurrent.



Any thoughts?



_______________________________________________

Interest mailing list

Interest at qt-project.org<mailto:Interest at qt-project.org>

https://lists.qt-project.org/listinfo/interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20220131/4bbd3374/attachment-0001.htm>


More information about the Interest mailing list