[Interest] Guide me through the Qt offerings for GUIs

Tue Apr 20 22:53:20 CEST 2021

TL;DR: We agree that QPlainTextEdit is "bad" (see below for definition 
of "bad"). We (apparently) disagree on what's required to correctly 
parse a file for syntax highlighting.

On 20/04/2021 15.03, Roland Hughes wrote:
> On 4/20/21 9:18 AM, Matthew Woehlke wrote:
>> On 20/04/2021 09.10, Roland Hughes wrote:
>>> Close on the heels of that "Why are they highlighting the whole 
>>> thing?" when you only need the currently visible lines and possibly a 
>>> screen up/down. Open up a 10,000 line source file in editors using 
>>> Scintilla or the Electron JavaScript based things, or even GUI Emacs 
>>> even on an i5-gen3 and you are almost instantly taken to the line you 
>>> were on with perfect syntax highlighting.
>>
>> Frankly, I am skeptical. At best I think you are misrepresenting the 
>> problem.
>>
>> It's well known that you *have* to highlight the whole file, at least 
>> to the current position, for accurate ("perfect") highlighting. At 
>> least in the general case. There may be some files and/or syntaxes for 
>> which that it not the case, but vim has had partial highlighting for 
>> almost forever, and I've seen it drop the ball on plenty of occasions, 
>> because something earlier in the file changes correct highlighting for 
>> the visible part.
> 
> No, it's not well known or even accurate. IF one is taking a student 
> type approach and attempting to use regular expressions all mushed 
> together as one nasty bundle, the statement is true.
> 
> Let's look at three editors that seem to do it right.
> 
> KATE

Well, I find it ironic you include Kate (which really means katepart, 
i.e. Kate, KWrite, KDevelop and maybe others), since AFAIK that *does* 
highlight the whole file. It also *definitely* uses regular expressions 
(and Qt's RE engine, AFAIK), although it also uses a mix of other stuff 
optimized for common cases.

But maybe we're talking about different things. When I say "highlight", 
what I really mean is "syntax parse". Trying to *render* the entirety of 
a large file is, indeed, madness.

...and see my previous comments; QPlainTextEdit was never meant for 
that. Frankly, if you are using QPlainTextEdit to hold more than ~16KiB 
of text... stop that.

> Now lets pick on Featherpad trying to do it "the Qt way"

I'm not sure what you consider "the Qt way". IMHO, katepart *is* "the Qt 
way". It's the way (well, *a* way, anyway) any sane person using Qt 
would implement a text editor that's intended to be usable with large 
documents.

> To highlight only the visible subset one simply needs to know the outer 
> most enclosing boundaries.

Well... yes. And no. The problem is, in order to know those boundaries, 
you have to look at *everything* in them. You can't just, for instance, 
see a '{' and decide you can skip everything until the next '}', because 
that brace might be inside a string literal.

I haven't looked at katepart's guts enough to know how they work or if 
they try to employ any clever optimizations. I *have* written 
highlighters for katepart, however, and knowing what those look like, 
I'm far from convinced that any such optimizations are implemented, or 
indeed, even possible. Katepart's highlighting is based on a context 
stack, with each detection rule potentially altering that stack. You 
can't just skip rules, because doing so means the wrong rule might end 
up gobbling some token, which will lead you into a wrong stack state, 
and things will just get worse from there.

Of course, this may all be happening in a separate thread, and it isn't 
using QPlainTextEdit; katepart I'm almost certain has its own structures 
for managing state and keeping track of breaking the text into 
highlighted chunks.

> Text isn't a stream.

Katepart would disagree. Although the way of *expressing* how to handle 
line ends is different from handling other tokens, they are, at the end 
of the day, handled almost exactly the same as any other token. The 
difference is mainly that the rule to detect a newline is built-in and 
uses different syntax to express how said rule should modify the context 
stack.

There is no magic that allows you to begin parsing in the middle of a 
file. If you do that, you will get something that is *wrong*. 
(Admittedly, not always, but sometimes, and vim is proof.)

Moreover, if you insist on expecting "text" to be structured in nice, 
neat records delimited by "line breaks", you are going to be in for a 
rude awakening when someone decides to try to open a "condensed" XML, 
JS, or whatnot. (Katepart *mostly* handles these gracefully. By default, 
it gives up at IIRC 1024 characters and forces a line break. You can 
raise that limit... admittedly, at your own peril. In a sense, katepart 
can get bogged down due to taking the approach you are *recommending*.)

> Despite what people say, it's almost always LF CR in a file.

This is trivially disproven by `unix2dos`. Or looking at most text files 
created on a Windows machine.

   $ echo 'Hello, world!' | unix2dos | od -t x1
   0000000 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 0d 0a

That's a CR (0x0d, '\r') followed by a LF (0x0a, '\n').

> The last rarity on the original ASCII systems is just 0x0D. I forget 
> where that got used most often.

That's "classic" (pre-OS-X) Macintosh line endings 🙂.

> At any rate, to successfully syntax highlight a narrow window all you 
> really need to know is the outermost enclosing scope which gets 
> determined during load and stored in a structure.

Right. And to know that correctly, you have to parse *the whole file* 
(at least to the point you want to render).

> For a couple hundred lines on a fast machine it is close enough for 
> hand grenades. For editing anything of any significance, no. At the 
> core of the class is the wrong object. The needed higher level 
> objects also aren't there.
Well, yes, I don't think we're in any disagreement that QPlainTextEdit 
is the wrong choice for "large content". (Which I will insist on 
expressing in bytes, having seen one-"line" files that would bring 
QPlainTextEdit to its knees even *without* syntax highlighting.)

-- 
Matthew