[Web] [Development] Search engine indexing of qt-project.org

Giuseppe D'Angelo dangelog at gmail.com
Tue Jun 11 17:30:28 CEST 2013


CCing web at qt-project.org...

On 11 June 2013 17:13, Robin Burchell <robin+qt at viroteck.net> wrote:
> Hi,
>
> I've noticed that the quality of search results for Qt documentation
> (and so forth) in search engines has been terrible. I just noticed
> that they now seem to be hosted directly on http://qt-project.org,
> which has a robots.txt with the following gem:
>
> User-agent: *
> Disallow: /

Really? I get a much more complete robots, which disallows only certain parts...

> This seems ... very counterproductive. It certainly explains the bad
> results. Can someone please comment why this is so, if this is
> correct, or forward it to the right people to get it fixed please?

Amongst other things, this is still open (together with its
suggestions to improve ranking, which unless implemented can't be
disproven) https://bugreports.qt-project.org/browse/QTWEBSITE-504

--
Giuseppe D'Angelo

(for reference)

$  GET http://qt-project.org/robots.txt
# Start Qt Developer Network file for metal spiders

# Skip the following:
User-agent: *
Disallow: /images/
Disallow: /themes/
Disallow: /forums/member_search/
Disallow: /forums/search_results/
Disallow: /forums/search/
Disallow: /member/
Disallow: /ignore_member/
Disallow: /forums/new_topic_search/
Disallow: /forums/view_pending_topics/
Disallow: /email/
Disallow: /wiki/*/edit/
Disallow: /wiki/*/revision/
Disallow: /wiki/diff/
Disallow: /wiki/pdf/
Disallow: /revision/
Cache-delay: 5
Crawl-delay: 2

# Allow Google to crawl our general images
User-agent: Googlebot-Image
Allow: /images/

# End robots.txt file



More information about the Web mailing list