[Interest] [External]Re: How to get QtConcurrent to do what I want?
wr at warmi.net
wr at warmi.net
Mon Jan 31 18:08:31 CET 2022
It does not make sense to me to have allocation/population of 60k instances of Tile + QPoint into a QVector taking 5 seconds. It took 5 ms on my computer…
Having said that , if allocation and assignment are your bottleneck , just use placement new operator with a large block of memory or just pure array of 60k structures + QConcurent::map with interators instead of a vector.
You will still need to initialize the array so if that’s the true bottleneck then I guess handle your own threading where you reinitialize pool of threads in a loop passing computed range without the initial storing step.
Sent from my iPhone
> On Jan 31, 2022, at 2:38 AM, Andrei Golubev <andrei.golubev at qt.io> wrote:
>
>
> Hello,
>
> What Scott suggests should help a lot.
>
> Using a slightly different approach, if you know how many tiles you need, say 60k, isn't it enough to figure out the tile position already?
> Given an index 0 <= i < 60k, you should be able to map it to a 2-dimensional coordinates if you know how many tiles you have horizontally and vertically (actually, just horizontally is enough) - and you do, because the amount of horizontal tiles = image.width() / tile.width().
> So, if I think about it right, i / (n horizontal tiles) gives you row number, i % (n horizontal tiles) gives you column number.
>
> Now you can generate a 60k size vector of indices from 0 to 59999 (basically std::iota()) and use that as an input for the tile calculation function through QtConcurrent. Internally, you figure out the tile position in 2d space and from that you get your QPoint + QSize for the square patch (guessing, that i index would basically be a hand-written alternative to blockIdx in CUDA and friends). This now allows you to calculate your tiles vector in parallel as well (without actually using a vector).
>
> On a side note, doesn't QtConcurrent provide a way to execute a function on a range? Maybe you don't even need to supply a vector of indices, just a half-open interval from 0 to 60k.
> Given that QtConcurrent should handle any sort of callable, you can also supply it a lambda with capture list containing your hyper parameters like tile.width(), tile.height(), and any other external context information.
>
> --
> Best Regards,
> Andrei
> From: Interest <interest-bounces at qt-project.org> on behalf of Scott Bloom <scott at towel42.com>
> Sent: Monday, January 31, 2022 4:09 AM
> To: Murphy, Sean <Sean.Murphy at centauricorp.com>; Tony Rietwyk <tony at rightsoft.com.au>; interest at qt-project.org <interest at qt-project.org>
> Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?
>
> Couple things I would try.
>
> First, preallocate the size of the vector, or use a list if you don’t need random access into it.
>
> Second, just send, pos, size into the tile. Only save the values in the constrctor. When the worker thread kicks off on a tile, then initialize it and do any computation. This includes the allocation of the submatrix.
>
> Also, does it need to be a shared pointer? Its clear who owns the pointers, they aren’t being “shared” as much as use. For me, I only use shared pointers, when multiple objects can “own” the pointer. In this case, the tile manager owns the tiles.
>
> Mainly, do as much as possible to reduce the time in the nested loop.
>
> Scott
>
> From: Interest <interest-bounces at qt-project.org> On Behalf Of Murphy, Sean
> Sent: Sunday, January 30, 2022 4:59 PM
> To: Tony Rietwyk <tony at rightsoft.com.au>; interest at qt-project.org
> Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?
>
> Thanks for the response, but I'm not following your suggestion - or at least I'm not seeing how it's different than what I'm doing? Maybe a little pseudocode will help. Here's what I'm currently doing:
>
> Tile class:
> private:
> QPoint mPos;
> int mSize;
>
> tile::tile(QPoint pos, int size) :
> mPos(pos),
> mSize(size)
> {
> // assigns this tile an mSize x mSize square
> // from the original image starting at mPos
> // pixel location in the original image
> }
>
> void tile::process()
> {
> // does the work on the assigned subset
> }
>
> TileManager:
> private:
> QVector<QSharedPointer<tile>> mTiles;
>
> processTile(QSharedPointer<tile>& t)
> {
> t->process();
> }
>
> tileManager::setup(QSize tileGrid, int tileSize)
> {
> // generate each tile with its assignment
> for(int i=0; i < tileGrid.height(); ++i)
> {
> for(int j=0; j < tileGrid.width(); ++j)
> {
> // create the new tile while assign its
> // region of the original image
> QSharedPointer<tile> t(new tile(
> QPoint(j * tileSize, i * tileSize),
> tileSize));
> mTiles.append(t);
> }
> }
> QtConcurrent::map(mTiles, processTile);
> }
>
> So I think I'm already doing what you're saying? Where I'm paying the penalty is that the allocation of each tile is happening in one thread and I'd like to see if I can thread out the object creation. But I don't see how to simultaneously thread out the tile objection creation AND correctly assign the tile its location since as far as I can tell, when QtConcurrent executes tileManager's processTile function in parallel there's nothing I can poll inside tileManager::processTile() that allows me to know WHICH step I'm at.
>
> Or am I misunderstanding what you're saying?
>
> The best thing I can come up with is that maybe I could change the type of my mTiles vector to be a QVector<QPoint>> but then I'd still need to loop through nested for-loop to populate all the QPoint items in the vector I want to pass to QtConcurrent::map(). I have tried that yet to see if generating thousands of QPoint objects is faster than generating the same number of tiles, but I can test that out.
>
> Sean
> From: Interest <interest-bounces at qt-project.org> on behalf of Tony Rietwyk <tony at rightsoft.com.au>
> Sent: Sunday, January 30, 2022 7:19 PM
> To: interest at qt-project.org <interest at qt-project.org>
> Subject: [External]Re: [Interest] How to get QtConcurrent to do what I want?
>
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
> Hi Sean,
>
> Can you use the position of the tile as a unique key? Then the manager only needs to calculate each tile's position in the original image. Each tile extracts the bits, processes and notifies the result with its position.
>
> Regards, Tony
>
>
> On 31/01/2022 10:06 am, Murphy, Sean wrote:
> I'm hitting a design issue with the way I'm using the QtConcurrent module to do some image processing, and I'm wondering if someone can give some pointers?
>
> At a high level, the software needs to do some processing on every pixel of an image. The processing can mostly be done in parallel, so I've created the following:
> Tile class - responsible for doing the processing on a small subset of the original image
> Has a constructor that takes a Position and Size. From those parameters, the Tile knows what subset of the original image it is going to process
> Has a process() function which will do the work on those assigned pixels
> TileManager class - responsible for managing the Tile objects
> Contains a for-loop that creates each Tile object, assigns it a unique Position, and adds it to the QVector<Tile> vector
> Has a processTile(Tile& t) function which calls t.process() to tell a given Tile to begin its work
> Calls QtConcurrent::map(tiles, processTile) to process each tile
> So far this works well, but as I was timing different parts of the codebase, I discovered that a large portion of the time is spent allocating the QVector<Tile> vector (step 2a above) before I get to the concurrent processing call. The reason why is obvious to me - I need to ensure that each tile is created with a unique assignment and as far as I can see, that need to happen in a single thread? If I could instead pass off the Tile creation to the parallel processing step, I might be able to improve the overall performance, but I don't see a way around it within the QtConcurrent framework.
>
> How can I go about creating Tile objects in parallel AND ensure that each of them gets a unique Position assignment? I could easily move the Tile allocation into processTile(), but if I do that, I don't see a way make the unique position assignment since I don't see how a given call to processTile() would know where it is in the overall parallelization sequence to determine what Position to assign to the Tile it creates. If I were using something like CUDA, I could use things like blockIdx and threadIdx to do that, but as far as I can see, those concepts don't exist (or at least aren't exposed) in QtConcurrent.
>
> Any thoughts?
>
>
> _______________________________________________
> Interest mailing list
> Interest at qt-project.org
> https://lists.qt-project.org/listinfo/interest
> _______________________________________________
> Interest mailing list
> Interest at qt-project.org
> https://lists.qt-project.org/listinfo/interest
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20220131/00e036dd/attachment.htm>
More information about the Interest
mailing list