[Development] Qt PDF as a new TP module for Qt 5.14

Wed Aug 14 17:40:44 CEST 2019

On 13 Aug 2019, at 18:11, Thiago Macieira <thiago.macieira at intel.com<mailto:thiago.macieira at intel.com>> wrote:

On Tuesday, 13 August 2019 02:20:04 PDT Shawn Rutledge wrote:
When it comes to Qt Quick, either solution can be exposed either as a
subclass of QImageIOHandler (so that it “just works” as an image format) or
QQuickImageProvider.

Wait, what? Why would you do it via *image* handlers?

Because when we ask PDFium to render a page for us, it uses Skia to do that.  The easiest way to render a page (the only way we support so far) is to get back an image for the whole page, and then stick it in the scene graph, or paint it into a widget.

Even if we later can figure out how to use QPainter, we can still end up with an image.

Even later, maybe we can figure out how to use the scene graph to render individual glyphs, individual images that the PDF contains, individual vector graphics if the PDF contains line drawings, etc.  But it’s rather like the situation with SVG: we should perhaps ideally break it up into some sort of DOM and render all the stuff that the SVG contains, but instead we just treat it as an image format, because qtsvg knows how to use QPainter to render an SVG to an image.  Nevertheless we still get the benefit of scalability (you can zoom in to an SVG in Qt Quick, and make it render sharper as you do so, by re-rendering the whole thing at higher resolution.  And yes you pay the memory and time penalties for doing that.)  It might even turn out to be more efficient the way we are doing it: take the CPU hit to make a big texture once, and from then on there’s just one texture node in the scene graph, rather than creating thousands of scene graph nodes to compose the page and re-rendering those at 60fps.

Most of PDF is text. Is
PDFium and the proposed module capable of selecting text, copy it to the
clipboard, show the document structure, etc.?

That will come soon: it’s on the requirements list to be able to select and copy text.  PDFium provides API to get the character index at a location, the string from one index to another, and the geometry from one index to another.  So I think I will implement it by letting the user drag across the page, find the geometry of the text that is being thus selected, render a translucent rectangle on top to show that it’s selected, and allow copying that text to the clipboard.

PDF text rendering is complicated, I’m afraid.  Various kerning strategies are possible (including putting every glyph at its own independent position, as it appears to be when you render from latex to PDF).  Adobe used to advertise that there is a font-matching technique so that anyone on any computer can render a decent approximation of a document even if you don’t have the fonts that were used in the original.  So I suspect it will be difficult to render text on top of a page and have the glyph geometry match up exactly to the glyphs that PDFium is rendering; but we will have to solve that problem if we later want to use QPainter to do the rendering.  (I have seen really bad text rendering on Linux PDF viewers in the past.  They have gotten way better over the years.)

What I dislike about Pdfium so far is that it has its own raster engine to
do the rendering: we can only get fully rendered images out of it so far,
not QPainter calls.  It may be that it’s faster or slower than it would be
to use QPainter; but either way, it’s a kind of bloat to ship another
internal paint engine.  But who knows, if we want to spend the time we
might be able to refactor it, depending on whether there is some way to get
rendering callbacks out of it.  I haven’t tried to figure that out either.

Indeed, but it might be worth it. We won't know until someone posts the
analysis.

I want to figure that out at some point, but I don’t see how I can put such investigation at a higher priority than implementing features that we have commercial customers asking for (the time frame for which is ASAP).  Of course anyone else is welcome to read the code and report what he or she can find out about how it’s all done internally.  In 2017, a commenter on the blog post where qtpdf was announced (https://blog.qt.io/blog/2017/01/30/new-qtpdf-qtlabs-module/) pointed me to https://pdfium.googlesource.com/pdfium/+/master/core/fxge/skia/ , and https://github.com/amplab/ray-core/blob/master/src/examples/ui/pdf_viewer/pdf_viewer.cc as an example.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/development/attachments/20190814/8447eb8e/attachment.html>