[Qt-interest] Some thoughts about QThread::idealThreadCount()

Thiago Macieira thiago.macieira at trolltech.com
Fri Mar 6 02:01:42 CET 2009


Yves Bailly wrote:
>Hello (ex)Trolls and all,

Hi Yves

We're still Trolls :-)
[see the t-shirts in the 4.5 release blog]

>It seems this method just returns the number of cores found on the
>system (at least in the very few tests I made). Say I have a dual-
>core CPU: this method returns 2. So, if I use QThreadPool, it will
>allow me to run simultaneously only 2 QRunnable, queuing the extra
>ones I might require (assuming I understand the doc correctly).
>
>But wait... creating and running 2 QRunnable will give me *3* threads:
>the two I created, *and* the main application thread... this is more
>than "idealThreadCount()", so it's not ideal, so I'll loose perfs... ?

Well, I think you're skipping an important part of the equation.

The number of cores is the ideal thread count, at least with current 
technology (in a system with a high-count of cores, maybe the situation 
will differ). That means running 2 threads in a dual-core system is ideal. 
So there's nothing wrong with the name of the function.

QtConcurrent uses that to decide how many threads it creates. It cannot, 
however, control the threads that it didn't create -- that is, threads 
outside the global thread pool. If you create a thread with QThread or 
non-Qt mechanisms (such as the main thread), QtConcurrent doesn't count it 
and doesn't use it.

In general, the main thread is the GUI thread, so it also can't be used 
for CPU-intensive tasks, as that could freeze the UI. Instead, the main 
thread should sleep waiting for the other ones to be finished or for there 
to be another event to react to.

>Another approach. Consider an algorithm that bounce everywhere in a
>huge memory array, for example a simple QuickSort implementation. This
>type of algorithm will cause many cache misses on the processor, thus
>the process will spend most of its time waiting for data to be fetched
>from main memory.
>I did some benchmarks, and I discovered that even on a single-CPU as
>"old" as a Pentium-4, threading a QuickSort can bring significant gains
>in speed (I tried up to 8 threads, gaining overall more than 40%). This
>just because while a thread waits for its data to be fetched, another
>can work. So in this case, whereas "idealThreadCount()" would probably
>return 1 (haven't tried yet), my "ideal thread count" for my specific
>problem would be around 8.

In that case you're correct. But that would also mean that "ideal thread 
count" is actually dependent on the actual job you're trying to make.

A quick-sort algorithm might have lots of cache misses and be therefore 
stalled waiting for data. Another good example is I/O-bound processes, 
where the latency to obtain data is much higher. So when you're building a 
program in a multicore system, you usually pass a number higher than the 
number of cores to make -j.

But that's not the rule: there are other algorithms that could run at 100% 
CPU usage without starving for data. In that case, the ideal thread count 
is exactly the number of cores.

And there may be also cases where the ideal count is less than the number 
of cores, such as the case of a highly-contended shared resource.

>Conclusion: having a way to know the number of cores in a systems is
> very interesting, thanks for providing us with a so useful method. But
> naming it "idealThreadCount()" does not seem really correct to me, at
> least from a semantic point of view.

Naming it  like that allows us to adjust for future development of the 
technology. Suppose that a 64-core processor becomes reality tomorrow, but 
the operating system actually presents us with "16 processors" only, each 
one using 4 sub-cores for tasks like short-lived threading. Such a system 
would not sustain 64 threads running simultaneously, but could run 16 of 
them and dispatch up to 4 parallel instructions to each sub-core in each 
processor.

Or, another possibility is for the system to have specialised cores, such 
as floating point / multimedia cores, arithmetic / logic cores.

In fact, I  think the above combination is exactly what the Itanium 
architecture is: each processor can have multiple specialised cores, 
sometimes more than one of each.

-- 
Thiago Macieira - thiago.macieira (AT) nokia.com
  Senior Product Manager - Nokia, Qt Software
      Sandakerveien 116, NO-0402 Oslo, Norway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20090305/577974a1/attachment.bin 


More information about the Qt-interest-old mailing list