[Web] [Development] Search engine indexing of qt-project.org

Tue Jun 11 17:30:28 CEST 2013

CCing web at qt-project.org...

On 11 June 2013 17:13, Robin Burchell <robin+qt at viroteck.net> wrote:
> Hi,
>
> I've noticed that the quality of search results for Qt documentation
> (and so forth) in search engines has been terrible. I just noticed
> that they now seem to be hosted directly on http://qt-project.org,
> which has a robots.txt with the following gem:
>
> User-agent: *
> Disallow: /

Really? I get a much more complete robots, which disallows only certain parts...

> This seems ... very counterproductive. It certainly explains the bad
> results. Can someone please comment why this is so, if this is
> correct, or forward it to the right people to get it fixed please?

Amongst other things, this is still open (together with its
suggestions to improve ranking, which unless implemented can't be
disproven) https://bugreports.qt-project.org/browse/QTWEBSITE-504

--
Giuseppe D'Angelo

(for reference)

$  GET http://qt-project.org/robots.txt
# Start Qt Developer Network file for metal spiders

# Skip the following:
User-agent: *
Disallow: /images/
Disallow: /themes/
Disallow: /forums/member_search/
Disallow: /forums/search_results/
Disallow: /forums/search/
Disallow: /member/
Disallow: /ignore_member/
Disallow: /forums/new_topic_search/
Disallow: /forums/view_pending_topics/
Disallow: /email/
Disallow: /wiki/*/edit/
Disallow: /wiki/*/revision/
Disallow: /wiki/diff/
Disallow: /wiki/pdf/
Disallow: /revision/
Cache-delay: 5
Crawl-delay: 2

# Allow Google to crawl our general images
User-agent: Googlebot-Image
Allow: /images/

# End robots.txt file