[Development] required QTextBoundaryFinder behavior [and API] fixes

Knoll Lars Lars.Knoll at digia.com
Fri Sep 21 10:29:01 CEST 2012


Had a chat with Konstantin on IRC today. We agreed to go for a1-a3 for now. In addition, iteration will start at -1 (ie. an invalid boundary), so that you can get all info by doing a loop starting with toNextBoundary().

But it needs to go in before beta 2, and all usages of QTBF in qt5.git and qt-creator.git need to be checked and fixed where required.

We can still consider adding a QTextBreakIterator class in 5.x if it gives makes things better or faster. 

Cheers,
Lars

On Sep 10, 2012, at 8:27 PM, ext Konstantin Ritt <ritt.ks at gmail.com> wrote:

> Hi folks,
> 
> In fact, the current QTBF behaves just like if it were a broken break
> iterator...
> I mean that, [Issue1] despite it's name, it stops at every break
> opportunity and reports NotAtBoundary via boundaryReasons() method for
> the break opportunities that are not boundaries (this affects Line and
> Word modes).
> [Issue2] As for Grapheme and Sentence modes, there are no optional
> break opportunities and thus such behavior is ok, except of
> boundaryReasons() does a wild guess based on surrounding white space
> characters
> and reports (StartWord | EndWord) or NotAtBoundary reasons most of the time.
> [Issue3] All this requires the developer to use two different
> iteration models according to which of QTBF modes is currently set:
> iterating by using toNextBoundary() - for Grapheme and Sentence modes,
> and iterating by using toNextBoundary() with extra checking the
> boundaryReasons() result - for Line and Word modes.
> 
> But even then, there is no guarantee QTBF will produce expected results.
> A good example of what I'm saying about is searching the word
> start/end positions at some [arbitrary] position:
> 
> [code] // -- from src/plugins/platforms/windows/qwindowsinputcontext.cpp:~560
>    // Find the word in the surrounding text.
>    QTextBoundaryFinder bounds(QTextBoundaryFinder::Word, surroundingText);
>    bounds.setPosition(pos);
>    if (bounds.isAtBoundary()) {
>        if (QTextBoundaryFinder::EndWord == bounds.boundaryReasons())
>            bounds.toPreviousBoundary();
>    } else {
>        bounds.toPreviousBoundary();
>    }
>     const int startPos = bounds.position();
>     bounds.toNextBoundary();
>     const int endPos = bounds.position();
> [/code]
> 
> In the code above, if the surroundingText doesn't contain a word or if
> it ends up with a several white space characters at \a pos, then the
> result is a garbage.
> 
> 
> I see a two major ways to fix the behavior and make the iteration
> process consistent unaware of which mode is in use:
> 
> A) a1. introduce BreakOpportunity BoundaryReason enum value and make
> boundaryReasons() report BreakOpportunity (instead of NotAtBoundary)
> for the break opportunities that are not boundaries in Line and Word
> modes, and for the boundaries in Grapheme and Sentence modes;
>    a2. introduce MandatoryBreak BoundaryReason enum value and make
> boundaryReasons() report MandatoryBreak (instead of combination of
> StartWord and EndWord values) for the mandatory line breaks (CR, LF,
> NEL, EOT);
>    a3. make boundaryReasons() carefully report StartWord and/or
> EndWord exactly for word start and word end positions.
> 
> B) b1. fix QTBF to *not* stop at break opportunities that are not
> boundaries in Line and Word modes in order to fix Issue1;
>    b2. apply a2 and a3 to QTBF in order to fix Issues 2 and 3;
>    b3. introduce a new QTextBreakIterator class that would implement
> everything described in A (this could be delayed for 5.1). Then, QTBF
> could be a cheap convenience layer on top of QTBI. Alternatively, QTBI
> could provide both "toNextBreak()" and "toNextBoundary()" methods in
> order to replace QTBF completely.
> 
> Either way, a major impact of such a change is that that
> boundaryReasons() will never report StartWord/EndWord in modes other
> than Word + boundaryReasons() will never report NotAtBoundary when
> toPreviousBoundary()/toNextBoundary() stops at a valid position.
> Because of QTBF is broken-by-design and the code that uses it should
> be revised anyways, I believe Qt5 is a most-correct time to fix it and
> such an API and behavior change is still acceptable for 5.0 even now,
> after beta1 is released.
> 
> I, personally, like the second option quite more.
> What do you think? Any objections on making described changes in 5.0?
> 
> Kind regards,
> Konstantin




More information about the Development mailing list