[Development] RFC: unified data model API in QtCore => thin wrapper proposal

Arnaud Clère arnaud.clere at minmaxmedical.com
Fri Jun 15 13:31:04 CEST 2018


-----Original Message-----
From: Thiago Macieira <thiago.macieira at intel.com> 
Sent: jeudi 14 juin 2018 02:08

> This email is to start that discussion and answer these questions:
>   a) should we have this API?
>   b) if so, what would this API look like?
>   c) if not, should we unify at least JSON and CBOR?
>   c) either way, should remove QCborValue until we have it?
...
> This API would also be the replacement of the JSON and CBOR parsers, for which 
> we'd have a unified API, something like:
>   QFoo json = QJson::fromJson(jcontent);
>   QFoo cbor = QCbor::fromCbor(ccontent);
>   qDebug() << QCbor::toDiagnosticFormat(json);	// yes, json

Hi all,

As I was saying during QtCS "QDebug and others" session, structured traces need 
a way to serialize in-memory data structures in a number of formats for 
interoperability reasons. This requires a generic way to traverse data structures, 
not necessarily a generic data structure.

A common data structure for Cbor and Json makes sense since they share so much. 
But even these 2 formats have peculiarities that may need distinct APIs like Cbor 
tags. This is even more true for Xml. I think that Cbor found a sweet spot between 
generality and efficiency, so the data structure behind QCborValue will be good 
for many use cases. But like as a generic data structure, it will not suit all use 
cases. Look at Python: although it has general purpose list, dict, and so on, its
Json decoder provides hooks to bind data to specialized types.

So, I think it is perfectly Ok to have a common data structure for Cbor and Json 
but it does not preclude the need for specific APIs. Also, specific documentation 
is usually easier to understand than generic one since you can refer to Json and Cbor 
standards and examples. I think it is also Ok to have QCborValue in 5.12 because 
we can always add a more generic API as a thin layer on top of specialized data 
structures, especially in Qt6 if we take advantage of C++17.

Since the title of the discussion is so general, please let me sketch here what 
such API could be. That may help find a definitive answer to your questions.

The basic existing API for reading/writing is streams. One problem with streams 
is that the structure of the data being traversed is lost. So, a reader must know 
the structure to read the data. And in some cases, ambiguities may even prevent 
from reading back the data:

    cout << 1.0 << 1;
    cin >> myFloat >> myInt; // may well read myFloat==1.1, myInt==0

QDebug avoids most problems by inserting spaces between << by default but does not 
allow reading. Also, a user-defined type T must write slightly different code for writing 
in QDebug, and other formats, and for reading the resulting text...

The approach we took in the MODMED project originates in functional Zippers which
are generalized iterators for arbitrary data structures, not just sequences. It 
makes the data structure apparent in the traversal. Also, the traversal can be adapted 
to the task at end. For instance, a user-defined type may ignore some Json data it 
does not understand while reading. Thus, the approach, allows to bind "any data 
with a common structure" such as a generic QCborValue and a user-defined type or 
a QByteArray containing Cbor data or utf8 encoded Json.

Let me dream what this approach could look like in Qt, by first using the approach
to directly write some usual data types in Cbor or Json:

    QVector<double> vector = {1.,2.};
    QByteArray buffer;
    QCborWriter cborw(&buffer);
    cborw.sequence().bind("val").bind(true).bind(vector); 
    // buffer = 0x9F6376616...

Note: A generic encoder would use Cbor indefinite length arrays and few or no Cbor tags
so a specialized encoder would still be needed for some use cases.

    buffer.clear();
    QJsonWriter jsonw(&buffer);
    jsonw.sequence().bind("val").bind(true).bind(vector); // same code as above
    // buffer = ["val",true,[1.0,2.0]]

In our approach, "bind" handles Read and Write the same way, so it is possible to do:

    QString val; bool b;
    QJsonReader jsonr(&buffer);
    jsonr.sequence().bind(val).bind(b).bind(vector); // same code with lvalues
    // val = "val", b = true, ...

This can work with any in-memory data type like QMap, QVector or QCborValue.
It just requires a bind method or QBind<TResult,T> functor definition. Let me show 
you the default templated QBind definition:

    template<class TResult, typename T>
    struct QBind {
        static TResult bind(Val<TResult> value, T t) {
            return t.bind(value); // In case of error, define a T::bind(Val<TResult>) method or an external QBind<TResult,T>::bind(Val<TResult>,T) functor
        }
    };

Most user-defined bind methods would be very simple and the type system would 
guarantee that data is well-formed (no sequence or record left open...):

    struct Person {
        QString m_firstName, m_lastName;
        int m_age;
    
        template<TResult>
        TResult bind(Val<TResult> value) { return value
            .record()
                .sequence("name")
                    .bind(m_firstName)
                    .bind(m_lastName)
                .out()
                .bind("age" , m_age); // automagically closes opened record
        }
    };

One advantage of the approach is that such boiler-plate code would have to be written
once for any TResult (be it a QJsonReader, QCborWriter, etc.), so the above code 
would be enough to allow:

    QByteArray json; 
    QJsonWriter(&json) jsonw; jsonw.bind(Person {"John","Doe",42}); // json = {"name":["John","Doe"],"age":42}
    Person p;
    QJsonReader(&json) jsonr; jsonr.bind(p); // p = Person {"John","Doe",42}
    QByteArray cbor; 
    QCborWriter(&cbor) cborw; cborw.bind(p); // cbor = 0xBF646E616D659F64...

Note: Dynamic data structures' bind methods need to handle Write and Read differently
but user-defined types are rarely dynamically-sized.

The approach even works with QIODevice and no intermediate in-memory data, 
so it is possible to do:

    QIODevice in, out;
    // open appropriately
    QJsonReader(&in ) jsonr;
    QCborWriter(&out) cborw;
    if (cborw.bind(jsonr)) cout << "Done."; 
    // transforms any Json to Cbor without loading everything in memory

To sum up:
* this approach can use QCborValue which is a nice balance between generality
and efficiency that provides in-place editing
* QBind could provide QCborValue (or any other data type) generic read/write to a 
number of formats, but...
* specific writers/readers may always be necessary
* QCborValue may differ from QJsonValue at one time to handle Cbor tags and other peculiarities

To move on, we have a working structured traces library implementing this approach.
However, its write performance is 10 times that of boiler-plate code using QDebug. 
Based on our previous work and using modern C++ compilers, it seems possible to 
implement the approach with more reasonable write performance. So, I will try to 
submit a proof of concept in the following days.

In the meanwhile, I've put a few details on our approach and links to related 
discussions on the "QDebug" session wiki page:
https://wiki.qt.io/QDebug_and_other_tracing_facilities

Hope it helps,
Arnaud



More information about the Development mailing list