[Development] required QTextBoundaryFinder behavior [and API] fixes

Mon Sep 17 11:58:01 CEST 2012

Anyone?

Konstantin

2012/9/10 Konstantin Ritt <ritt.ks at gmail.com>:
> Hi folks,
>
> In fact, the current QTBF behaves just like if it were a broken break
> iterator...
> I mean that, [Issue1] despite it's name, it stops at every break
> opportunity and reports NotAtBoundary via boundaryReasons() method for
> the break opportunities that are not boundaries (this affects Line and
> Word modes).
> [Issue2] As for Grapheme and Sentence modes, there are no optional
> break opportunities and thus such behavior is ok, except of
> boundaryReasons() does a wild guess based on surrounding white space
> characters
> and reports (StartWord | EndWord) or NotAtBoundary reasons most of the time.
> [Issue3] All this requires the developer to use two different
> iteration models according to which of QTBF modes is currently set:
> iterating by using toNextBoundary() - for Grapheme and Sentence modes,
> and iterating by using toNextBoundary() with extra checking the
> boundaryReasons() result - for Line and Word modes.
>
> But even then, there is no guarantee QTBF will produce expected results.
> A good example of what I'm saying about is searching the word
> start/end positions at some [arbitrary] position:
>
> [code] // -- from src/plugins/platforms/windows/qwindowsinputcontext.cpp:~560
>     // Find the word in the surrounding text.
>     QTextBoundaryFinder bounds(QTextBoundaryFinder::Word, surroundingText);
>     bounds.setPosition(pos);
>     if (bounds.isAtBoundary()) {
>         if (QTextBoundaryFinder::EndWord == bounds.boundaryReasons())
>             bounds.toPreviousBoundary();
>     } else {
>         bounds.toPreviousBoundary();
>     }
>      const int startPos = bounds.position();
>      bounds.toNextBoundary();
>      const int endPos = bounds.position();
> [/code]
>
> In the code above, if the surroundingText doesn't contain a word or if
> it ends up with a several white space characters at \a pos, then the
> result is a garbage.
>
>
> I see a two major ways to fix the behavior and make the iteration
> process consistent unaware of which mode is in use:
>
> A) a1. introduce BreakOpportunity BoundaryReason enum value and make
> boundaryReasons() report BreakOpportunity (instead of NotAtBoundary)
> for the break opportunities that are not boundaries in Line and Word
> modes, and for the boundaries in Grapheme and Sentence modes;
>     a2. introduce MandatoryBreak BoundaryReason enum value and make
> boundaryReasons() report MandatoryBreak (instead of combination of
> StartWord and EndWord values) for the mandatory line breaks (CR, LF,
> NEL, EOT);
>     a3. make boundaryReasons() carefully report StartWord and/or
> EndWord exactly for word start and word end positions.
>
> B) b1. fix QTBF to *not* stop at break opportunities that are not
> boundaries in Line and Word modes in order to fix Issue1;
>     b2. apply a2 and a3 to QTBF in order to fix Issues 2 and 3;
>     b3. introduce a new QTextBreakIterator class that would implement
> everything described in A (this could be delayed for 5.1). Then, QTBF
> could be a cheap convenience layer on top of QTBI. Alternatively, QTBI
> could provide both "toNextBreak()" and "toNextBoundary()" methods in
> order to replace QTBF completely.
>
> Either way, a major impact of such a change is that that
> boundaryReasons() will never report StartWord/EndWord in modes other
> than Word + boundaryReasons() will never report NotAtBoundary when
> toPreviousBoundary()/toNextBoundary() stops at a valid position.
> Because of QTBF is broken-by-design and the code that uses it should
> be revised anyways, I believe Qt5 is a most-correct time to fix it and
> such an API and behavior change is still acceptable for 5.0 even now,
> after beta1 is released.
>
> I, personally, like the second option quite more.
> What do you think? Any objections on making described changes in 5.0?
>
> Kind regards,
> Konstantin