Sitemaps
Publish date 24/04/2008
sitemaps are XML-files that inform search engines about pages on your site.
It can have a lot of information; when it was last updated, how often it changes and how importand the page is relative to other pages on your site.
This allows search engines to better crawl your site and index it.The file
Usually it is named sitemaps.xml and placed in the root directory. (But you can give it another name in robot.txt)
Important to know is that it must be written in UTF-8 and can not be bigger then 10MB (10 485 760 bytes) or 50 000 URLs. You can compress it using gzip to save bandwidth but uncompressed it still can't be bigger then 10MB.If you really want to list more pages in sitemaps, you can split over several pages.
More here.
The format
Below is a basic example of a sitemap:
As with all XML-files, any data must use entity escaped codes for special characters.<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.example.com/</loc> <lastmod>2008-03-24</lastmod> <changefreq>weekly</changefreq> <priority>0.8</priority> </url> <url> <loc>http://www.example.com/products.html</loc> <changefreq>monthly</changefreq> </url> <url> <loc>http://www.example.com/detail.php?category=2&ln=54</loc> <lastmod>2008-03-10T12:00:00+00:00</lastmod> <priority>0.4</priority> </url> </urlset>The available tags are described below:
Character Escaped code Ampersand & & Single quote ' ' Double quote " "e; Greater than > > Less then < <
Attribute Description <urlset> required Encapsulates the files and reference of the protocol <url> required Parent tag of the URL. Everything below are children of this tag. <loc> required URL of the page. Must begin with the protocol (http://) and end with a trailing slash. <lastmod> optional Date of the last modification of this page (or file). This should be in W3C Datetime format. <changefreq> optional How frequently this page changes. Valid values are:
The value always should be used for documents each time they are visited. The value never is for archived URLs.
- always
- hourly
- daily
- weekly
- monthly
- yearly
- never
<priority> optional The priority of the URL relevent to other URLs on your site. Between 0.0 and 1.
1 is then for your most important page.
Sitemap Index
There is a limit on how many pages you list in one sitemaps-file. 50 000 URLs or 10MB (10 485 760 bytes).
But you can use multiple files, of course you need to tell them where they can be found. here comes the sitemap index file.It must also be UTF-8 encoded, and can not list more then 1000 sitemaps or be larger then 10MB.<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.sitemaps.org/shemas/sitemap/0.9"> <sitemap> <loc>http://www.example.com/sitemap1.xml.gz</loc> <lastmod>2008-02-14T18:31:17+00:00</lastmod> </sitemap> <sitemap> <loc>http://www.example.com/example/sitemap2.xml</loc> <lastmod>2008-03-20</lastmod> </sitemap> </sitemapindex>
You can only specify sitemaps on the same site, not on other sites.
Tag Description <sitemapindex> required Encapsulates information about all the sitemaps in the file. <sitemap> required Encapsulates information about 1 individual sitemap in the file. <loc> required Location of the sitemap file. <lastmod> optional Identifies the sitemap file was last modified.
Let it be known
When you have upload the file to your server, you need to make sure that search engine crawlers can find the file.
robots.txt
The easiest way is to add it to your robots.txt-file.
Simply add the following line:Make sure you put the full URL of the sitemap. (http://www.example.com/sitemaps.xml)Sitemap: sitemaplocation
You can add multiple lines if you have multiple sitemap-files.Submitting
You can also submit it directly to the searchengine.
- Yahoo Now merged with Bing (Need to register)
- Bing (Need to register)
- Google (Need to register)
Submitting via an HTTP request
This is for the more tech-savvy. You can submit it directly to a searchengine using an HTTP request. This can be done with wget, curl or any other program. A successful request will return an HTTP 200 response code.
A complete reference can be found at www.sitemaps.org