[Interest] Extract only src attribute of an image tag into XmlRole with XPath functions

Gian Maxera gmaxera at gmail.com
Wed Jun 3 12:57:38 CEST 2015


Hello Federico,
the problem is that the img tag inside the description has been escaped :-(
So, I cannot access it using a query path :-(
If you look at the content of description so can see that instead of having:
<description><img src”…”> … </description>
I have the image escaped:
<description><img src=“ … </description>

So, I think I need something like that:

XmlRole { name: “img”; query: “fn:somefunction( description/string(), ‘some regexp for extract’)” }

But no idea what to use, not event if it’s feasible.

Thanks,
Gianluca.


> On 3 Jun 2015, at 11:46, Federico Buti <bacarozzo at gmail.com> wrote:
> 
> ​Hi Gianluca,
> 
> depending on the full XML you can query for the description tag and than query for the first img tag with "[1]", something like this:
> 
> 
>          XmlListModel {
>             id: model
> 
>             query: "(/path/to/description/img)[1]"
>             XmlRole { name: "img"; query: "@src/string()" }
>         }   
>> 
> Here I defined a role for the "src" attribute as a string.
> Hope that helps.
> 
> Cheers,
> 
> 
> ---
> Federico Buti
> 
> On 3 June 2015 at 11:26, Gian Maxera <gmaxera at gmail.com <mailto:gmaxera at gmail.com>> wrote:
> Hello,
> I have an rss feed coming from Tumblr blog page.
> The xml of the feed has into description a lot of html content that I want to remove and keep only the first image I found.
> For example, this is one of the content into description tag:
> 
> <description><img src="http://33.media.tumblr.com/bd4312958b742a21221e87c0a96d52c1/tumblr_np213siZRu1tbs1mwo1_500.gif <http://33.media.tumblr.com/bd4312958b742a21221e87c0a96d52c1/tumblr_np213siZRu1tbs1mwo1_500.gif>"/><br/> <br/><img src="http://33.media.tumblr.com/6f1ca4ab1ef3d2504b2da48f2616df6e/tumblr_np213siZRu1tbs1mwo2_400.gif <http://33.media.tumblr.com/6f1ca4ab1ef3d2504b2da48f2616df6e/tumblr_np213siZRu1tbs1mwo2_400.gif>"/><br/> Paris Marriott Champs Elysees<br/><br/> <img src="http://36.media.tumblr.com/0f4726dd2f19c8d4a042f72786987573/tumblr_np213siZRu1tbs1mwo3_500.jpg <http://36.media.tumblr.com/0f4726dd2f19c8d4a042f72786987573/tumblr_np213siZRu1tbs1mwo3_500.jpg>"/><br/> <br/><h2><b>Marriott Hotels in France Celebrate Earth Hour 2015</b></h2><p>See what happened during Earth Hour celebrations at Marriott hotels in France.</p></description>
> 
> What I would like is to have a XPath function that return me only the first image url:
> http://33.media.tumblr.com/bd4312958b742a21221e87c0a96d52c1/tumblr_np213siZRu1tbs1mwo1_500.gif <http://33.media.tumblr.com/bd4312958b742a21221e87c0a96d52c1/tumblr_np213siZRu1tbs1mwo1_500.gif>
> 
> How can I do that ?
> 
> Thanks,
> Gianluca.
> 
> _______________________________________________
> Interest mailing list
> Interest at qt-project.org <mailto:Interest at qt-project.org>
> http://lists.qt-project.org/mailman/listinfo/interest <http://lists.qt-project.org/mailman/listinfo/interest>
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.qt-project.org/pipermail/interest/attachments/20150603/7ed18ea9/attachment.html>


More information about the Interest mailing list