[Interest] [Semi OT] Concurrent (multi-threaded) read/write disk IO?

Thu Feb 5 15:23:25 CET 2015

Am 05.02.2015 um 14:44 schrieb Keith Gardner <kreios4004 at gmail.com>:

Thanks for the quick reply!

>> ...
>> Specifically I have the following scenario in mind: "batch image conversion". ...
> 
> Have you looked at ThreadWeaver (http://api.kde.org/frameworks-api/frameworks5-apidocs/threadweaver/html/index.html)?  They have an example application that does the exact scenario you are describing.  In this case, you would have a thread pool and you would just issue jobs instead of managing your threads directly.

I have quickly read the description. At a quick glance ThreadWeaver seems to be the equivalent of "Grand Central Dispatch" or an extended QThreadPool/QRunnable.

At the moment I am not yet concerned about how I would organise my Reader/Writer/Worker threads.

At the moment my question is rather: does it actually make sense to have two distinct Reader/Writer threads (assuming concurrent access to the harddisk, which would be magically optimised by the underlying OS), or shall I just have one single thread which either reads or writes at a time?

>  
> You can have a job that performs the IO and put restrictions on how many of those jobs can be run in concurrently.

Or in other words, given such an organisation of threads (jobs): shall I or shall I not put restrictions on such "IO jobs"?

> 
>> If the answer is "don't bother in your application! Just read and write at the same time and the OS/driver will figure out the best access pattern for you already!", then I guess consideration of SSDs probably become moot.
> 
> I am using ThreadWeaver on an embedded device that has to write to an SD-Card.  File IO is painfully slow but using ThreadWeaver allows for me to rate limit complex events appropriately. 

So I understand concurrent read/write on an SD-Card is worse than sequential, and you solved that by "(rate) limiting the number of concurrent IO jobs to 1".

In my case I am only concerned about "desktop class systems" (Mac, Linux, Windows), but I get your message: even iMacs with their rather slow "laptop class harddisks" would suffer if I would read/write at the same time.

Then again: don't those "desktop class OSes" have sophisticated IO algorithms which would maybe slightly delay a write operation (buffering it in the meantime), until a given read operation would stop? Or in other words: wouldn't those OSes (or the harddisk drivers or even harddisk controllers/firmwares) interleave my continuous read/write requests in a clever way already, such as not to "thrash" the disk too much?

Somehow I have the feeling that by enforcing sequential access on application level (in whatever means: restricting the number of concurrent "IO Jobs" to 1, homegrown "IO Manager"...) I would re-invent the wheel, or even worse: sacrifice IO performance (since the OS would do a way better job at scheduling read/write requests)

> When moving my application to an SSD, I just change how many concurrent File IO events run in parallel so that it can scale with the system.

So I understand concurrent read/write access is not such a problem for SSDs.

It would be nice if I would not have to distinguish between "slow moving harddisk" and "SSD" in my application and the underlying OS would do the optimal read/write access pattern for me.

Then on the other hand I am realist and I get from your experience that I /do/ have to care on application level...

Thanks!
  Oliver
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20150205/30e3620d/attachment.html>