[Qt-interest] Some thoughts about QThread::idealThreadCount()
Thiago Macieira
thiago.macieira at trolltech.com
Fri Mar 6 02:01:42 CET 2009
Yves Bailly wrote:
>Hello (ex)Trolls and all,
Hi Yves
We're still Trolls :-)
[see the t-shirts in the 4.5 release blog]
>It seems this method just returns the number of cores found on the
>system (at least in the very few tests I made). Say I have a dual-
>core CPU: this method returns 2. So, if I use QThreadPool, it will
>allow me to run simultaneously only 2 QRunnable, queuing the extra
>ones I might require (assuming I understand the doc correctly).
>
>But wait... creating and running 2 QRunnable will give me *3* threads:
>the two I created, *and* the main application thread... this is more
>than "idealThreadCount()", so it's not ideal, so I'll loose perfs... ?
Well, I think you're skipping an important part of the equation.
The number of cores is the ideal thread count, at least with current
technology (in a system with a high-count of cores, maybe the situation
will differ). That means running 2 threads in a dual-core system is ideal.
So there's nothing wrong with the name of the function.
QtConcurrent uses that to decide how many threads it creates. It cannot,
however, control the threads that it didn't create -- that is, threads
outside the global thread pool. If you create a thread with QThread or
non-Qt mechanisms (such as the main thread), QtConcurrent doesn't count it
and doesn't use it.
In general, the main thread is the GUI thread, so it also can't be used
for CPU-intensive tasks, as that could freeze the UI. Instead, the main
thread should sleep waiting for the other ones to be finished or for there
to be another event to react to.
>Another approach. Consider an algorithm that bounce everywhere in a
>huge memory array, for example a simple QuickSort implementation. This
>type of algorithm will cause many cache misses on the processor, thus
>the process will spend most of its time waiting for data to be fetched
>from main memory.
>I did some benchmarks, and I discovered that even on a single-CPU as
>"old" as a Pentium-4, threading a QuickSort can bring significant gains
>in speed (I tried up to 8 threads, gaining overall more than 40%). This
>just because while a thread waits for its data to be fetched, another
>can work. So in this case, whereas "idealThreadCount()" would probably
>return 1 (haven't tried yet), my "ideal thread count" for my specific
>problem would be around 8.
In that case you're correct. But that would also mean that "ideal thread
count" is actually dependent on the actual job you're trying to make.
A quick-sort algorithm might have lots of cache misses and be therefore
stalled waiting for data. Another good example is I/O-bound processes,
where the latency to obtain data is much higher. So when you're building a
program in a multicore system, you usually pass a number higher than the
number of cores to make -j.
But that's not the rule: there are other algorithms that could run at 100%
CPU usage without starving for data. In that case, the ideal thread count
is exactly the number of cores.
And there may be also cases where the ideal count is less than the number
of cores, such as the case of a highly-contended shared resource.
>Conclusion: having a way to know the number of cores in a systems is
> very interesting, thanks for providing us with a so useful method. But
> naming it "idealThreadCount()" does not seem really correct to me, at
> least from a semantic point of view.
Naming it like that allows us to adjust for future development of the
technology. Suppose that a 64-core processor becomes reality tomorrow, but
the operating system actually presents us with "16 processors" only, each
one using 4 sub-cores for tasks like short-lived threading. Such a system
would not sustain 64 threads running simultaneously, but could run 16 of
them and dispatch up to 4 parallel instructions to each sub-core in each
processor.
Or, another possibility is for the system to have specialised cores, such
as floating point / multimedia cores, arithmetic / logic cores.
In fact, I think the above combination is exactly what the Itanium
architecture is: each processor can have multiple specialised cores,
sometimes more than one of each.
--
Thiago Macieira - thiago.macieira (AT) nokia.com
Senior Product Manager - Nokia, Qt Software
Sandakerveien 116, NO-0402 Oslo, Norway
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : http://lists.qt-project.org/pipermail/qt-interest-old/attachments/20090305/577974a1/attachment.bin
More information about the Qt-interest-old
mailing list