sitemap -vs- robot.txt - Post ID 125112

User 472422 Photo


Registered User
111 posts

Greetings

What is the relationship between the sitemap and robot.txt? If I excluded folders/pages in a sitemap but not in a robot.txt which would the search engine consider?

Tony
User 133269 Photo


Registered User
2,900 posts

the search engine should obey the robots.txt rules firstly

the sitemap is a way for the bots to index unlinked pages or pages linked with javascript - and to make a list of pages quickly

if you put a page on your sitemap but then ban it with the robots.txt the bots should NOT index it still... bit of a odd thing to do though - you'd usually not add it to the site map if you dont want it indexed too....

And just because a page is not on the sitemap does not mean it wont be indexed if the bots find it some other way...

Have fun
~ Fe Pixie ~
User 472422 Photo


Registered User
111 posts

Thanks

The sitemap could become quite unruly with thousands of pages. Could I create a limited page sitemap for consumption by site visitors but have a robot/.txt that allows for a more comprehensive crawl?
User 133269 Photo


Registered User
2,900 posts

hmm - they're kinda for different things - the robots.txt is for BLOCKING the bots - the sitemap is for giving them links to follow - so you wouldnt really have a list for them to follow in the robots.txt - just a list if where they're NOT allowed to go or a very general path like index:follow.... (meaning find the home page and find the rest yourself mr bot....)

generally the sitemap is used more by the bots than by the site visitors - so if it were me i'd include as many pages as possible in it...

I guess it could still work ok if you submitted the xml map with all the urls in it to the search engines, and gave the html version a bit of a trim for site visitors....
Have fun
~ Fe Pixie ~
User 126492 Photo


Ambassador
1,506 posts

If you use a robots.txt file to block spiders from certain pages on your site there is no guarantee that every bot will follow the rules, infact it would be true to say that there are a lot of dodgy bots that do not stick to the rule.

You can set different rules for different bots, but you would finish up with an awfully big robots.txt file.

One way to block a page is by adding this meta tag.

<meta name="robots" content="noindex,nofollow,nocache">


A sitemap is a must for a flash based site, without one it is fair to say that only the first entry page will be found by a search engine, as of yet bots cannot properly read information within a .swf file, but they are learning, so by having a sitemap you are at least telling a bot you have other pages you want it to index.

I don't see why bots shouldn't index a site if it only had an sitemap.xml file, you would still have to submit it though, the only problem is that only the big SE's only follow the Sitemaps XML format, having said that you will only get visitors from the biggest se anyway.
Jim
---------------------------

Have something to add? We’d love to hear it!
You must have an account to participate. Please Sign In Here, then join the conversation.