[Interest] Guide me through the Qt offerings for GUIs

Roland Hughes roland at logikalsolutions.com
Tue Apr 20 23:35:17 CEST 2021

On 4/20/2021 3:53 PM, Matthew Woehlke wrote:
> TL;DR: We agree that QPlainTextEdit is "bad" (see below for definition 
> of "bad"). We (apparently) disagree on what's required to correctly 
> parse a file for syntax highlighting.
> On 20/04/2021 15.03, Roland Hughes wrote:
>> On 4/20/21 9:18 AM, Matthew Woehlke wrote:
>>> On 20/04/2021 09.10, Roland Hughes wrote:
>>>> Close on the heels of that "Why are they highlighting the whole 
>>>> thing?" when you only need the currently visible lines and possibly 
>>>> a screen up/down. Open up a 10,000 line source file in editors 
>>>> using Scintilla or the Electron JavaScript based things, or even 
>>>> GUI Emacs even on an i5-gen3 and you are almost instantly taken to 
>>>> the line you were on with perfect syntax highlighting.
>>> Frankly, I am skeptical. At best I think you are misrepresenting the 
>>> problem.
>>> It's well known that you *have* to highlight the whole file, at 
>>> least to the current position, for accurate ("perfect") 
>>> highlighting. At least in the general case. There may be some files 
>>> and/or syntaxes for which that it not the case, but vim has had 
>>> partial highlighting for almost forever, and I've seen it drop the 
>>> ball on plenty of occasions, because something earlier in the file 
>>> changes correct highlighting for the visible part.
>> No, it's not well known or even accurate. IF one is taking a student 
>> type approach and attempting to use regular expressions all mushed 
>> together as one nasty bundle, the statement is true.
>> Let's look at three editors that seem to do it right.
> Well, I find it ironic you include Kate (which really means katepart, 
> i.e. Kate, KWrite, KDevelop and maybe others), since AFAIK that *does* 
> highlight the whole file. It also *definitely* uses regular 
> expressions (and Qt's RE engine, AFAIK), although it also uses a mix 
> of other stuff optimized for common cases.
> But maybe we're talking about different things. When I say 
> "highlight", what I really mean is "syntax parse". Trying to *render* 
> the entirety of a large file is, indeed, madness.

Yes. It isn't *highlighted* until it is *rendered*.

> ...and see my previous comments; QPlainTextEdit was never meant for 
> that. Frankly, if you are using QPlainTextEdit to hold more than 
> ~16KiB of text... stop that.

There are a lot of little editors out there using QPlainTextEdit and one 
big mass of regular expressions where each line of the regular 
expression requires yet another pass through the text.

> Of course, this may all be happening in a separate thread, and it 
> isn't using QPlainTextEdit; katepart I'm almost certain has its own 
> structures for managing state and keeping track of breaking the text 
> into highlighted chunks.
Last I looked at KATE source, everything is happening in worker threads.
>> Text isn't a stream.
> Katepart would disagree. Although the way of *expressing* how to 
> handle line ends is different from handling other tokens, they are, at 
> the end of the day, handled almost exactly the same as any other 
> token. The difference is mainly that the rule to detect a newline is 
> built-in and uses different syntax to express how said rule should 
> modify the context stack.

No. Line endings are not "just another token." Linux always gets in 
trouble thinking like that. They are the record terminators on platforms 
intelligent enough to understand records. You have to get above the x86 
to experience that.

> There is no magic that allows you to begin parsing in the middle of a 
> file. If you do that, you will get something that is *wrong*. 
> (Admittedly, not always, but sometimes, and vim is proof.)
> Moreover, if you insist on expecting "text" to be structured in nice, 
> neat records delimited by "line breaks", you are going to be in for a 
> rude awakening when someone decides to try to open a "condensed" XML, 
> JS, or whatnot. (Katepart *mostly* handles these gracefully. By 
> default, it gives up at IIRC 1024 characters and forces a line break. 
> You can raise that limit... admittedly, at your own peril. In a sense, 
> katepart can get bogged down due to taking the approach you are 
> *recommending*.)
It's the correct approach.
>> Despite what people say, it's almost always LF CR in a file.
> This is trivially disproven by `unix2dos`. Or looking at most text 
> files created on a Windows machine.
Not really, no.
>  $ echo 'Hello, world!' | unix2dos | od -t x1
>  0000000 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 21 0d 0a
> That's a CR (0x0d, '\r') followed by a LF (0x0a, '\n').
Again, the ASCII world was much larger than the 
x86-wanna-be-a-real-computer-one-day-when-I-grow-up universe. Most early 
DOS versions, in particular the version of DOS shipping stock on the AST 
Premium 286, actually wrote LF CR. Some software wrote what you posted 
above, but not most of it. The error was introduced by people saying 
"carriage return linefeed" and developers not knowing any better. Most 
DOS (and GUI DOS) applications got modified to only require the two 
characters, accepting them in any order. The reason, if anyone needs 
one, was IEEE 1284 (parallel port printers) and the LF was needed by 
many drivers/printers to change the printhead direction.

I did a ton of serial comm work with 4, 8, and 16-port Digiboards back 
in the day under DOS. In the land of real computers LF CR was always 
written to disk. For transmission to physical print-head devices they 
were reversed but for storage always in that order.

I have had this argument many many many times. The reality is most PC 
coders never bothered to figure out what was going on. I got to watch a 
system fail spectacularly while doing a project at Navistar. Some little 
x86 and Linux only coder adamantly argued just like you above. Finally I 
said "Fine! Have the project lead record in the minutes that this was 
your decision and your decision alone and do it your way." That's what 
happened. He tested feeding only x86 generated feeds from something he 
wrote. Went live. First three systems to feed into it failed because 
those were real computers and they put it in the order I said.

Sometimes a child must place there hand on the stove to learn the 
meaning of hot.

Ah, here's a nice article I didn't write.


>> At any rate, to successfully syntax highlight a narrow window all you 
>> really need to know is the outermost enclosing scope which gets 
>> determined during load and stored in a structure.
> Right. And to know that correctly, you have to parse *the whole file* 
> (at least to the point you want to render).
You need to load each record and update the higher level structures, 
yes. Highlighting doesn't happen until rendering.

Roland Hughes, President
Logikal Solutions


More information about the Interest mailing list