[Interest] Thread-To-CPU-Core distribution

Thu Dec 29 00:43:51 CET 2022

Hej Konrad,

>>I can't hold back my curiosity, so I've got to ask: why? Why would you
>>do that? Why do you assume you know better than the operating system?

I totally understand the question and agree to anything else you state in your email below, however, I am dealing with a non-standard use case with comparably tough delay restrictions. I do it for R&D reasons related to remote music performances and the project where this works has been going into:

https://www.soundjack.eu/
https://www.soundjack.eu/publications/

In this domain we are constantly seeking opportunities to improve audio/video low(est)-latency on x-platform standard hardware. 

Especially on cheaper hardware such as the Raspi thread distribution was a benefit in that regard especially when processing audio with 32 samples/block and probably less but I need to compare and evaluate it more intensively now.

Best

Alex

--
http://www.carot.de
Email : Alexander at Carot.de
Tel.: +49 (0)177 5719797

> Gesendet: Mittwoch, 28. Dezember 2022 um 15:57 Uhr
> Von: "Konrad Rosenbaum" <konrad at silmor.de>
> An: interest at qt-project.org
> Betreff: Re: [Interest] Thread-To-CPU-Core distribution
>
> Hi,
> 
> On 28/12/2022 11:08, Alexander Carôt wrote:
> > in a special use case I launch audio- and video streaming classes in my Qt main thread and also a web browser as an interface (webview or webengine) but I want to have their operation consciously separated over available cpu cores such as
> >
> > Core 1: Audio Callback Thread
> > Core 2: Video Callback Thread
> > Core 3: Web browser
> >
> > and not let the system decide what runs on which core.
> 
> I can't hold back my curiosity, so I've got to ask: why? Why would you 
> do that? Why do you assume you know better than the operating system?
> 
> 
> There are subtleties in what core is assigned to which task. The kernel 
> knows stuff like IRQ affinities, hardware bus connections, IO port 
> assignments and so on that are fairly hard to guess for you in user 
> space. Normally the kernel will find a good balance between cache 
> affinity, short call paths into hardware and load distribution, so there 
> is usually no need for you to meddle in this.
> 
> 
> Unless you've found that rare unicorn of a real scheduling problem in 
> your OS _and_ your program only needs to run on that one machine... 
> don't meddle. Once you optimized for one machine, things will perform 
> (much) worse on different machines.
> 
> 
> Usually if your program does not perform well it is one of those problems:
> 
> a) you have unnecessarily complicated call paths in your program: you 
> need to shorten them
> 
> b) bad math: sometimes it is worth spending hours simplifying an 
> algorithm to save on a few microseconds - billions of microseconds in a 
> loop are hours after all.
> 
> c) too many context switches: use FEWER threads, not more, reduce 
> dependencies between threads
> 
> d) your hardware is not powerful enough for what you want to do: save 
> some money, get better hardware
> 
> e) you are waiting for the hard disk or network: use a cache (big 
> problem on Windows, Linux already does this for you)
> 
> 
> > With audio and video this works fine already according to
> >
> > https://eli.thegreenplace.net/2016/c11-threads-affinity-and-hyperthreading
> >
> > because I am using pthreads for them anyways.
> >
> > Now I wonder if this is also possible for Qt and my web browser instance in particular.
> You can always call sched_setaffinity with pid==0 from within that thread.
> 
> 
> 
>      Konrad
> 
> 
> _______________________________________________
> Interest mailing list
> Interest at qt-project.org
> https://lists.qt-project.org/listinfo/interest
>