[Development] Enhancement to QIODevice?

Wed Sep 16 20:17:07 CEST 2015

On Wednesday 16 September 2015 13:13:01 Andrzej Ostruszka wrote:
> On 09/16/2015 08:57 AM, Thiago Macieira wrote:
> >>> You could use peek() to search the buffer, then read() exactly as much
> >>> as
> >>> you really need.
> >> 
> >> I understand that we are talking about
> >> QIODevicePrivateLinearBuffer::peek() here.
> > 
> > No, I meant QIODevice::peek()
> 
> First of all - I do not claim that I have mastered Qt internals!
> So everything below you should take with a grain of salt :)

No problem. I don't remember the internals of peek(), but I do remember it 
trying to unread each character, one by one. Or was that 
QProcess::setReadChannel...?

> QIODevice::peek() is implemented in terms of QIODevice::read() (which
> then uses QIODevicePrivate::read() which does the memcpy()) and later
> putting it back.  Pretty convoluted but it really does not matter - I
> just assumed the simpler case of direct peek into the buffer.

Well, yeah, it needs to read at least one from the lower device into the 
QIODevice buffer. It will then copy the data again to your buffer. There's no 
way to avoid the double copy, unless device implementation (QSerialPort?) 
already read into the QIODevice buffer. Please talk to Alex Trotsenko to 
optimise this.

But canReadLine() needs to do something similar already. If the data isn't 
buffered yet, it needs to call readData() to copy into the buffer first, then 
read it.

> Here I must say that I don't understand.  Let me rephrase what I've said
> earlier.  Suppose that there is some data in buffer and I want to know
> if delimiter is in it.  Without direct access to this buffer memory I
> can only:
> 1. copy (by peeking()) it to some other place (possibly the final output
> buffer)
> 2. search for delimiter in my copy
> 3. if found: drop "number of bytes to delimiter" from buffer

Right.

Note that this is already an improvement over status quo: right now, you need 
to read() everything and buffer the data that you didn't need just yet.

> What I'm saying is that in successful case (when delimiter is found)
> this sequence is optimal.  However, in case when there is no delimiter
> yet in the buffer, operation 1 is wasted (not really needed).  And I
> don't see how to avoid it without allowing user to search in the
> internal buffer.

I see.

> Possible ways to allow for searching in this buffer are (I'm shortening
> names here):
> A. Add optional param to canReadLine() readLine() (for QIODP or maybe
> even QIOD)

Not to those functions. It would be a new set of them.

> B. Add this QIODPLB::countTo() functionality (which does not break or
> interfere with anything)
> C. Add QIODPLB::data() function to buffer which will return "first" (I'm
> talking in terms of QIODPLB implementation)

Returning access to the internal buffer is something to consider. It would 
allow for one memcpy fewer of the data. But it's not something that we can 
trivially do.

> As for C.  This is another solution - which also does not interfere with
> anything.  The amount of data is already available via size().
> Then any kind of "packet" termination can be implemented (not just by
> single char) on top of it.  Value returned by this QIODPLB::data() is
> only invalidated by explicit user reading so this would be pretty safe
> too and its only one line of code.
> 
> So what do you think?

The internal buffer is chunked, so the delimiter may not be in the first buffer. 
And I don't want to provide access to the internal buffers without a lot of 
thought to design, since after that it becomes ABI and we will have less 
freedom to improve.

QIODevice needs a complete redesign to support zero-copy. The current API 
created in Qt 4 (readData(), writeData(), etc.) is way too limited. The 
problem is that there's no way to do that without a major break in 
compatibility (so, Qt 6) and spending a *lot* of time fixing all the QIODevice 
subclasses (QBuffer, QFileDevice, QAbstractSocket, QProcess, QNetworkReply, 
QSerialPort, plus all the non-Qt code).

> > I think there are easier ways to do this, so I don't think it makes sense
> > to accept this suggestion.
> 
> Yes one can peek() at buffer but as I've tried to state this approach
> has some drawbacks (from performance point of view).  I can also
> understand that maintainers are worried about growing complexity and
> maintenance burden but honestly I don't see any growth in these from
> options B and C above.  These are internal, no testing for them is
> needed, they don't interfere with any other functionality etc.

I think you can help a lot by fixing some of the inefficiencies in peek(). That 
should help you quite a bit already. It won't be ideal, but it will be better. 
Or instead open the device in unbuffered mode and buffer yourself.

I understand the value of countTo(), but it can spiral out of control really 
quickly. After we add the one containing a single byte delimiter, we'll get 
people asking for:
 - string searching (like CRLF)
 - regular expression
 - state machine searching (CSV: comma, unless inside a "")

-- 
Thiago Macieira - thiago.macieira (AT) intel.com
  Software Architect - Intel Open Source Technology Center