[Interest] [External]Re: How to get QtConcurrent to do what I want?

Mon Jan 31 21:42:47 CET 2022

Not knowing if a partial value makes any sense to your system.  Qt::Concurrent::mappedReduced might make more sense, if its purely a speedup you are looking for, and not a "keep the GUI alive during it" possibly blockingMappedReduced.

If you need the gui, setting up a qfuturewatcher on the results of the mapped call, would be my approach

QFutureWatcher< XXX > watcher;
connect( &watcher, &finished, manager, &handleFinished);

auto future = QtConcurrent::mapReduced(tiles, processTile, mergeFunction)
watcher.setFuture( future );

-----Original Message-----
From: Scott Bloom 
Sent: Monday, January 31, 2022 12:35 PM
To: Murphy, Sean <Sean.Murphy at centauricorp.com>; interest at qt-project.org
Subject: RE: [Interest] [External]Re: How to get QtConcurrent to do what I want?

Ahh that does fill in a bunch of holes 😊

I would definitely scrap using QSharedPointer as well as QObject, and instead monitoring of the QObject, montor the QFuture that is returned from your map call.

Scott

-----Original Message-----
From: Interest <interest-bounces at qt-project.org> On Behalf Of Murphy, Sean
Sent: Monday, January 31, 2022 9:13 AM
To: interest at qt-project.org
Subject: Re: [Interest] [External]Re: How to get QtConcurrent to do what I want?

Thanks for the reply, Scott. My responses below:

> Couple things I would try.
> 
> First, preallocate the size of the vector, or use a list if you don't 
> need random access into it.

I had attempted this before my original post. My sequence of attempts before I posted went like this:
  1. Started with QList<QSharedPointer<tile>>, and calling list.append() to populate. 
    Noticed this was somewhat slow
  2. Switched to QVector<QSharedPointer<tile>>, still calling list.append() to see if that made 
    any difference. It didn't.
  3. Realized that calling append() on a QVector could be slow as the vector keeps growing 
    and is forced to reallocate, so I added a QVector::reserve() call beforehand to prevent that. 
    Still made no appreciable difference

I neglected to add the reserve() call to the code I posted here, so that's on me for not painting the full picture - more on that below. From the attempts above it didn't seem like the type of container, nor how I build it up were the problem.

> Second, just send, pos, size into the tile.  Only save the values in 
> the constrctor. When the worker thread kicks off on a tile, then 
> initialize it and do any computation.  This includes the allocation of the submatrix.

I think this is what I'm already doing, correct? My tile constructor only assigns those values and does nothing else (ACTUAL code below, not edited pseudocode like the previous email):
    tile::tile(int id, QPoint pos, int size, QObject *parent) : QObject(parent),
        mID(id),
        mPos(pos),
        mSize(size)
    {
    }

One thing to note here that I didn't include in my previous email, and as I keep investigating is starting to look like the real cause of slow allocation/assignment loop, my tile class inherits from QObject. My original thought was that I wanted to be able to use signals to monitor what's going on inside of the tile class, but I'm beginning to rethink whether I really need that at all, especially for the hassles that it causes: 
  1. Creating 60,000 QObjects in a single thread appears to be slow  
    a. I've since tested creating a 60,000 item QVector<int> AND looping through it assigning each
        item the value 0-59999, and using QElapsedTimer to measure that takes 0.3 ms vs. 5 seconds or
        so to execute my current loop
  2. Since QObject doesn't have a copy constructor, but QtConcurrent expects a container of type
    T items not T* items, I chose to wrap my tile* within a QSharedPointer to get everything to work out. 
    As you point out below, I may have chosen the wrong smart pointer type, since I'm not really 
    sharing the pointer around, but my goal there was to satisfy what QtConcurrent expects of the 
    container you pass in to the map() function

>
> Also, does it need to be a shared pointer?  Its clear who owns the 
> pointers, they aren't being "shared" as much as use.  For me, I only 
> use shared pointers, when multiple objects can "own" the pointer.  In this case, the tile manager owns the tiles.

From this comment, I'm realizing that posting pseudocode/stripped down code previously was a mistake as it was only painting a partial picture. The actual loop is this:
    // generate each tile with its assignment
    for(int i=0; i < nTiles.height(); ++i)
    {
        for(int j=0; j < nTiles.width(); ++j)
        {
            int id = i * nTiles.width() + j;
            QSharedPointer<tile> t(new tile(id,
                   QPoint(j * tileSize, i * tileSize),
                   tileSize, this));
            connect(t.get(), &tile::loadingDone,
                    this, &tileManager::tileLoaded);
            connect(t.get(), &tile::remappingDone,
                    this, &tileManager::tileRemapped);
            mTiles.append(t);
        }
    }

So there you can see that the tileManager class owns the shared pointer to each tile, and I also connect up a couple signals which are used to convey status back to the tileManager which in turn emits status signals back up to my UI class so it can show progress bars to the user that something is actually happening since this is a lengthy operation.

>
> Mainly, do as much as possible to reduce the time in the nested loop.

I'm not sure there's anything else I can do to reduce the time in the loop, beyond changing the tile class to no longer inherit from QObject, and just have it be a pure data class, but I'm open to any suggestions.

I probably should have mentioned in an earlier email that this is my first attempt at using QtConcurrent (if that wasn't obvious already), so I'm extremely open to any suggestions of refactoring that helps out this process. 

Disregarding anything I'm currently attempting, at a high level, this is the problem I'm trying to solve:
  1. We have a large file on disk that contains data that we want to convert to a grayscale image to 
    display to the user. This data is organized similar to an image in the sense that it can be thought of 
    as a rectangular grid of "pixels", but the dynamic range of data pixel values is much larger than 0-255.
  2. Converting the raw data to grayscale is a two step process:
    a. Loop through every data pixel to find the minimum and maximum pixel values in the entire image
    b. Loop through every pixel a second time, mapping the real dynamic range found in step a) to the 
        grayscale range of 0-255

So 2a and 2b both lend themselves to parallelization with a necessary synchronization step in between them - I can't start the grayscale remapping conversion until I've determined the true dynamic range of the original data.

Thanks to everyone that has responded so far...

_______________________________________________
Interest mailing list
Interest at qt-project.org
https://lists.qt-project.org/listinfo/interest