[Interest] [Semi OT] Concurrent (multi-threaded) read/write disk IO?

Thu Feb 5 15:19:53 CET 2015

On 05/02/2015 14:44, Till Oliver Knoll wrote:
>
> Am 05.02.2015 um 14:25 schrieb Till Oliver Knoll 
> <till.oliver.knoll at gmail.com <mailto:till.oliver.knoll at gmail.com>>:
>
>> ...
>>
>> Does it make sense to guarantee/enforce "sequential (exclusive) 
>> access to the harddisk" on application level, or would I re-invent 
>> functionality already present in the underlying OS/disk driver (and 
>> maybe even sacrifice performance)?
>
> I eventually found a link which seems to confirm that it would be best 
> to only have sequential read/write access with physically spinning 
> drives, that is, have some kind of "IO Manager" in the application:
>
> http://www.tomshardware.co.uk/forum/251768-32-impact-concurrent-speed
>
> Off course the tricky part then is that the Writer thread does not 
> block the Reader thread for too long, such that the Work Queue would 
> become empty (and the worker threads would be sitting there "idle").
>
> Likewise the Writer thread must have enough chances to write, such 
> that the Result Queue becomes not too large ("memory constraints"). 
> Probably some kind of priorisation scheme taking Queue counts into 
> consideration is the answer - but not part of my question; I am really 
> just interested in whether concurrent read/write access should be 
> avoided in the first place these days (or not).
>
> For SSDs it still might be okay (or even better?) to use concurrent 
> read/write access?
>
>

The usual answer is "it depends"..

It depends on how much data you are accessing at each write/read. It 
also depends on the underlying filesystem and size of files / how many 
files you are dealing with.

It also depends on your disk array, if you have one or more disks and 
capacity of the disks, which affects then number of read/write heads the 
disk has.  Also The NCQ* implementation and cache RAM amount in a disk 
makes a difference.

Then it depends if you need transactional writes, does the write need to 
sync immediately?
If you are on linux, you already get a lot of optimization out of the 
box, it is typically much better than any other OS.  But even within 
linux the filesystem used makes a difference, for example some 
filesystems are good with lots of small files.   Sometimes file deletion 
is the bottleneck.

In the end in spinning drives the underlying physics of spinning media 
and moving read/write heads affect things.

SSDs are typically ~100 times faster in seek operations.   Even there 
controllers cause significant differences depending on read/write patterns.

So before optimizing I'd benchmark, especially on linux the filesystem 
layer typically does a decent job already.

But if you want maximum IO performance, the rule of thumb is to group 
your reads and writes, and read/write as much data as possible at once. 
   Even SSDs typically favor this.  In highly parallel supercomputer 
settings different rules may apply.

Just my 2 cents,

Harri

*http://en.wikipedia.org/wiki/Native_Command_Queuing

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20150205/d1a70ee7/attachment.html>