[Interest] Can QSortFilterProxyModel sort the filtered data only?

Fri Apr 11 00:34:10 CEST 2014

On Wed, Apr 9, 2014 at 9:52 AM, André Somers <andre at familiesomers.nl> wrote:
>
> Mark Gaiser schreef op 8-4-2014 17:41:
>> Hi,
>>
>> When reading through the QSortFilterProxyModel i get the feeling that
>> sorting is done on the source model regardless of a filter.
>>
>> That is no problem with small models since sorting is quite fast, but
>> that can be a bit of a pain if you have millions of entries (which
>> filters reduce to hundreds) then sorting still seems to sort all
>> entries (as if nothing is filtered) which takes a bit of time.
> If you have millions of entries to filter, I doubt QSFPM is your best
> choice to begin with. In such cases, you probably have your data in a
> database anyway. The database is able to sort and filter your data way
> more efficiently than QSFPM will ever be able to do. It's whole
> structure is designed around doing tasks like that efficiently, using
> things like indices. So, instead of filtering millions of records in
> QSFPM, I'd use the capabilities of your data backend instead.
>>
>> A detour that would probably work is wrapping a QSortFilterProxyModel
>> (with just the filter) in another QSortFilterProxyModel and sorting on
>> that (since it's source would be a filtered model). This approach
>> basically makes one a "QFilterProxyModel" and the other a
>> "QSortProxyModel" which smells wrong.
>>
>> Another way would be to re-implement QAbstractProxyModel and
>> implementing my own sorting and filtering, but i'd like to prevent
>> that if possible.
>>
>> What i'm hoping for is some option in QSortFilterProxyModel where you
>> can define how you want to sort:
>> - Sort source model (sorts all data)
>> - Sort proxy model (sorts data that passed the filtered)
> It's not that simple. At first glance, you'd think that first filtering
> and then sorting would be optimal. But that really depends on how you
> implement the filtering and sorting. If you do naive filtering (which,
> AFAIK, QSFPM does) where you simply test each item one by one to see if
> it matches the filters, then it would be most efficient to first filter
> and then sort*. But if you have more knowledge on the dataset, perhaps
> because you know it is already sorted, or you have an index of some
> sort, you may be able to do your filtering way more efficiently,
> depending on the exact filter type of course. A filter like "all entries
> starting with string <ABC>" for instance would be very fast if you can
> use the knowledge that the dataset is sorted on that field. I am not
> sure how QSFPM is implemented, but if it first sorts and then filters,
> then that would be silly indeed if it uses the naïve filtering I think
> it does. Even in this case, such an option would not be needed I think.
> If you cannot do advanced filtering or sorting based on indices or other
> knowledge of the underlying data, then there is no way to optimize other
> than minimizing thouching the data. Configuration is not needed for that.
>
>
> André
>
> *) though perhaps you could come up with a scheme that does both at the
> same time somehow, doing the filtering the first time an item is
> 'touched' in the sorting algorithm pershaps. Perhaps QSFPM is that clever?

I think i need to share my explicit use case to make more sense.

The use case for this is quite simple. File browsing. The actual use
case within that is:
- Grouping files based on some (user defined) type (like mime)
- Within _each_ ("each".. that's the difficult part) group sorting on
some other type (like date, size, etc...)

What i have right now to implement this is:
- Each group of files is in it's own QSFPM
- All groups are in a model as well (QAbstractListModel)

So you get:
QAbstractListModel
| - QSFPM
| - QSFPM
| - ...

To visualize this in QML i have something like:
ListView {
   model: QAbstractListModel
   delegate: ListView {
      model: QSFPM
   }
}

That works. No doubt there. However, to sanely scroll through the
items i have to set a height for each nested ListView. That is where
problems occur. There you get insanely large ListView's with even
"small" folders that have 5000 files in them. In that case a ListView
(where one item/row occupies about 42 pixels in height) would total to
21000 pixels in height. Something QML can't handle.

So i have been suggested to flatten the list so that i only have one
QML ListView and one QSFPM with some smart filtering and sorting to
accomplish the same thing. It seems quite logical to take this
approach since i would be making a much more effective use of QSFPM.
But it looks like QSFPM isn't ready for this kind of usage so i'm
right now leaning very much towards implementing my own version based
of QAbstractProxyModel.

This is where my question came from. To see if QSFPM could be used in
a case like this. Looks like i answered my own question by the above
statement ;) But i am still very curious to know if there might be
another or better ways for doing this?

The data is just filesystem data, not database driven data like you
assumed. In this case i can't reduce the data size since that's just
not an option.
Note: This was one out of a couple ideas i had been given to make QML
speedy, but this seemed to make sense since i would be using the proxy
way more intensely and effectively.

As for the "millions of entries" part i said. I test this stuff with
dump folder that have millions of entries and see if QML can handle it
in a speedy fashion. It's insane, i know. Just my "would like to have
it fast" testcase :)