[Development] required QTextBoundaryFinder behavior [and API] fixes

Konstantin Ritt ritt.ks at gmail.com
Mon Sep 10 20:27:24 CEST 2012


Hi folks,

In fact, the current QTBF behaves just like if it were a broken break
iterator...
I mean that, [Issue1] despite it's name, it stops at every break
opportunity and reports NotAtBoundary via boundaryReasons() method for
the break opportunities that are not boundaries (this affects Line and
Word modes).
[Issue2] As for Grapheme and Sentence modes, there are no optional
break opportunities and thus such behavior is ok, except of
boundaryReasons() does a wild guess based on surrounding white space
characters
and reports (StartWord | EndWord) or NotAtBoundary reasons most of the time.
[Issue3] All this requires the developer to use two different
iteration models according to which of QTBF modes is currently set:
iterating by using toNextBoundary() - for Grapheme and Sentence modes,
and iterating by using toNextBoundary() with extra checking the
boundaryReasons() result - for Line and Word modes.

But even then, there is no guarantee QTBF will produce expected results.
A good example of what I'm saying about is searching the word
start/end positions at some [arbitrary] position:

[code] // -- from src/plugins/platforms/windows/qwindowsinputcontext.cpp:~560
    // Find the word in the surrounding text.
    QTextBoundaryFinder bounds(QTextBoundaryFinder::Word, surroundingText);
    bounds.setPosition(pos);
    if (bounds.isAtBoundary()) {
        if (QTextBoundaryFinder::EndWord == bounds.boundaryReasons())
            bounds.toPreviousBoundary();
    } else {
        bounds.toPreviousBoundary();
    }
     const int startPos = bounds.position();
     bounds.toNextBoundary();
     const int endPos = bounds.position();
[/code]

In the code above, if the surroundingText doesn't contain a word or if
it ends up with a several white space characters at \a pos, then the
result is a garbage.


I see a two major ways to fix the behavior and make the iteration
process consistent unaware of which mode is in use:

A) a1. introduce BreakOpportunity BoundaryReason enum value and make
boundaryReasons() report BreakOpportunity (instead of NotAtBoundary)
for the break opportunities that are not boundaries in Line and Word
modes, and for the boundaries in Grapheme and Sentence modes;
    a2. introduce MandatoryBreak BoundaryReason enum value and make
boundaryReasons() report MandatoryBreak (instead of combination of
StartWord and EndWord values) for the mandatory line breaks (CR, LF,
NEL, EOT);
    a3. make boundaryReasons() carefully report StartWord and/or
EndWord exactly for word start and word end positions.

B) b1. fix QTBF to *not* stop at break opportunities that are not
boundaries in Line and Word modes in order to fix Issue1;
    b2. apply a2 and a3 to QTBF in order to fix Issues 2 and 3;
    b3. introduce a new QTextBreakIterator class that would implement
everything described in A (this could be delayed for 5.1). Then, QTBF
could be a cheap convenience layer on top of QTBI. Alternatively, QTBI
could provide both "toNextBreak()" and "toNextBoundary()" methods in
order to replace QTBF completely.

Either way, a major impact of such a change is that that
boundaryReasons() will never report StartWord/EndWord in modes other
than Word + boundaryReasons() will never report NotAtBoundary when
toPreviousBoundary()/toNextBoundary() stops at a valid position.
Because of QTBF is broken-by-design and the code that uses it should
be revised anyways, I believe Qt5 is a most-correct time to fix it and
such an API and behavior change is still acceptable for 5.0 even now,
after beta1 is released.

I, personally, like the second option quite more.
What do you think? Any objections on making described changes in 5.0?

Kind regards,
Konstantin



More information about the Development mailing list