[Development] QtCS: Qt Quick Performance discussion

Tue Jun 23 10:03:39 CEST 2015

As promised in the talk, I moved the qmlbench tool to a separate repo and gave it a readme: https://github.com/sletta/qmlbench 

Still under my github account, but now it is at least documented so others can take part in interpreting the results. 

> On 22 Jun 2015, at 14:47, Robin Burchell <robin+qt at viroteck.net> wrote:
> 
> It seems that nobody took proper notes of the session (oops!) so I'm
> sending the notes Gunnar and I wrote up in preparation for the session.
> If anyone else has any recollections they'd like to add, please go ahead
> and do so :)
> 
> =====
> 
> # Performance!
> 
> * Why it's important
> * What needs to be checked?
> 
> # Benchmarking
> 
> * Need support in qmlbench for measuring memory usage.
>    * For each set of creation benchmarks, we should run them a second
>    time collecting memory information.
> * Once we have that, once graphed across a single test run, memory leaks
> become easy to spot
>    * Graphed over time, regressions between commits become easy to spot
> * Likewise for the number of items per frame
> 
> # Start up performance
> 
> We don't have good benchmarks for this, and creating them isn't trivial
> to measure. Need a few generated examples, maybe.
> 
> # Memory usage
> 
> * Our memory usage is pretty bad in a lot of areas.
> * Sharing data across processes (fork-booster)..
>    * Similar but better sharing achievable through qmlcompiler
> * Benefits could be achieved by carefully allocating all write-once
> expected-to-be-shared data contiguously
> * Dropping CPU-side image data. We recently added QSG_TRANSIENT_IMAGES
> to Qt Quick.
> 
> # Item creation performance
> 
> * We have pretty good benchmarks for this :)
> * But we (constantly!) keep regressing
>    * This week's example: two seperate regressions in QQuickImageBase
>    (high DPI, and automatic transform)
>    * Image items per frame dropped from ~550/frame to 492/frame -- ~10%
>    regression
>    * Allocations for 5000 images increased by 41mb
>    * This is just one case - it's nobody's "fault", there's just nobody
>    taking care of it.
> * Ideally need to do some work on automating runs of it (more on this
> later)
> * What is a good target?
>    * 1000 items / frame in qmlbench on a modern desktop / laptop (mbp
>    is one such)
>    * 100 items / frame on mobile and embedded
> 
> # Binding performance
> 
> I have no idea whether or not we have good benchmarks for this.
> 
> * Probably the creation ones cover the simple cases, but the more
> advanced ones probably need seperate coverage.
> * Ideally we also need to monitor the impact of things changing in
> bindings (& multiple things changing at once, and so on?)
> * Help creating benchmarks welcome!
> 
> # Graphics Performance:
> 
> Are we good here? I think we're close at least.. 
> * Clipping has been brought up as an issue, 'simplerenderer' solves that
> * Poor batching gives worse result, 'simplerenderer' solves that also,
> but be mindful that simplerenderer also has ~2-3x worse performance
> overall.
> 
> # Recommendations for working on QtQuick
> 
> * Avoid structures like QHash unless you are sure they are needed (they
> have a heavy cost, and for <1k items or so, QVector, or std::vector, are
> often a better choice)
> * Avoid signal connections
>    * Virtuals instead might be a good choice
>    * If you really have to use them, use qmlobject_connect
>    * See prior art:
>       * qtdeclarative: 0de680c8e8fab36e386dca35e5008ffaa27e8ef6
>       * qtdeclarative: 7568922fa240e6e9440e9c6e93bf8ec00c06ec17
> * Memory compactness:
>    * Don't introduce padding holes
>    * Don't increase the size of frequently allocated things (nodes,
>    items) "accidentally" without careful consideration
>    * Use a lazily-allocated ExtraData for things that aren't needed
>    often
>    * Consider your data types carefully - don't use a 64 bit int for an
>    "on/off" toggle
> * Consider custom data structures & allocation (page allocation of the
> shadow nodes was a big win)
> 
> # Specific Items?
> 
> * Rectangle implementation could be improved quite a bit (Gunnar?)
> * Text node improvements (Eskil?)
> * Hash of shadow nodes in the batched render is a large problem
>    * https://codereview.qt-project.org/#/c/97708/
> * Delaying compilation (or whatever it is) on inline components until
> they are used. Right now, these can have massive impacts on
> performance/memory unless moved to external files. e.g. "Component {
> Dialog { ... foo ... } }, only used when a button is pressed. It may
> never be pressed.
> * "Don't use" classes, like SpriteSequence :)
> * In general, small items take up huge amounts.. Repeater { model: 500;
> Rectangle { width: 100; height: 100; radius: 10 } } and you have 1Mb or
> something :)
> * QObject -> QQmlData / QQuickItem / QQuickItemPrivate etc all adds up
> on individual items - but what can we do to fix that?
> * The recent introduction of 'padding' in the box model might need a
> second look to make sure it isn't increasing item sizes in common cases.
> 4 extra doubles is quite a large addition to item sizes.
> _______________________________________________
> Development mailing list
> Development at qt-project.org
> http://lists.qt-project.org/mailman/listinfo/development