How to Create a Google Sitemap Tutorial

This page serves as a tutorial explaining what a Sitemap is and how to create Google-friendly Sitemaps. This is a brief tutorial intended to teach beginners to create a functional Sitemap and does not go into some of the more complex issues.

Many websites have links to pages labeled "Sitemap" or "sitemap". In most of these cases, the page referenced is, in fact, not the kind of Sitemap that Google and other search engines are looking for—so you may think you've created a Sitemap while in fact what you've done is created an index of pages of your site—still useful, but not quite the same thing.

What is a Sitemap?

A sitemap (with a lowercase "s") is a page on a website which contains links to other pages on the site—usually organized in such a manner that humans will find it useful. A Sitemap (with a capital "S") is an XML file which lists URLs on a given site which are available for crawling by search engines, organized in such a way as to be useful to the search engines. A Sitemap contains metadata for each URL which provides information to the search engine regarding each page's relative importance, frequency of update, last update, etc.

While any page containing links to every page on your site will help to some degree, a Sitemap must use XML and the code must follow a specific pattern in order to be most efficient in terms of providing accurate information to search engines. A the time of this writing (summer 2010), Google utilizes the Sitemap Protocol 0.9 as defined by sitemaps.org.1

A Sitemap is useful because it provides a single location where a search engine may access all of the pages in a given site without having to crawl through the entire site, page-by-page, through what may be many layers of links.

Basic Format

Every Sitemap must be enclosed by the tags <urlset xmlns="[namespace]">> and </urlset> and each page within the Sitemap must include, at a minimum, the <url> and <loc> tags (both of which must be closed). Optional tags include <lastmod>, <changefreq>, and <priority>.

Following is an example of a sample Sitemap with two entries using all required and optional tags:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.erikastokes.com/</loc>
      <lastmod>2010-07-29</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.erikastokes.com/about-me.php</loc>
      <lastmod>2010-07-05</lastmod>
      <changefreq>yearly</changefreq>
      <priority>0.4</priority>
   </url>
</urlset> 

In the above example I've set up two pages—my main page and my "About Me" page in the Sitemap. I've set the last modification dates and indicated their change frequency and relative priority. The change frequency doesn't have to be exact, and the priority is completely subjective—it's just about how important you, as a site owner, feel the page is relative to the other pages on your site; I'll discuss that in further depth later.

What do all those Sitemap tags mean?

The Required Tags

  • <urlset> and </urlset> — These tags tell search engines like Google where the Sitemap begins and ends. Start the list of pages after the <urlset> tag and end it just before the closing </urlset> tag.
  • <url> and </url> — These tags tell the search engine where the information for each individual page listed in the Sitemap begins and ends. All the other tags for each page come between these tags.
  • <loc> and </loc> — "Location" — These tags are the most important; they tell the search engine where to find the page. Without the location, all the other information is useless.

The Optional Tags

  • <lastmod> and </lastmod> — "Last Modified Date" — This tells the search engine when the page was last modified. The format for this is YYYY-MM-DD. Remember to include a leading zero for any numbers less than 10. For example, 2010-07-29 is correct, but 2010-7-29 is not. You can also include the time of day, but that seems unnecessarily complex. You should only use this tag if you are committed to updating your Sitemap every time you update your pages, otherwise it will quickly become out-of-date and may do more harm than good.
  • <changefreq> and </changefreq> — "Change Frequency" — This tells the search engine how often the page is likely to change. It doesn't have to be exact and the search engine may choose to visit pages more or less frequently than indicated. For example, if you set the change frequency to hourly and your page does not have a particularly high page ranking, Google may choose to visit only every few days... or less often. Or, if you set the change frequency to monthly, Google may still decide to visit every few days... or more often. Here are the possible values for this tag. The first is explained, the others are self-explanatory:
    • always — This indicates a dynamic page that changes every time it is accessed, for example, a search results page or a weather monitoring site.
    • hourly
    • daily
    • weekly
    • monthly
    • yearly
    • never
  • <priority> and </priority> — This tells search engines how important you think a page is compared to other pages on your site. This is perhaps the least useful tag for most sites because search engines, for the most part, do a pretty good job figuring out which pages are the most relevant to a given search. However, if you have two similar pages or several pages that are of low importance, this may be useful. The default priority for all pages is 0.5. The possible values range from 0.0 (least important) to 1.0 (most important). So if a search engine needs to decide between displaying one of two results on your site (not between two pages on different sites) to potential visitors and one has a priority ranking higher than the other, it might decide to display the one you ranked higher.

Non-Alphanumeric Characters in Sitemaps

A proper sitemap contains only ASCII letters, numbers, certain symbols, and all entities are escaped. This might sound scary, but it's pretty simple when you break it down--a typical site that is not database-driven or which has SEO-friendly URLs won't even have a problem with this. Entities are characters which have a special meaning in a URL or in HTML. The ones you need to escape are:

  • & — The ampersand character must always be written as &amp;
  • ' — The single quote character must always be written as &apos;
  • " — The double quote character must always be written as &quot;
  • < — The less than symbol must always be written as &lt;
  • > — The greater than symbol must always be written as &gt;

If your URLs contain letters which do not appear in the English alphabet, such as ç, ñ, or ü (among others), then you'll also need to use the code for those characters. You can find a list of these codes at T'N'T Luoma's website.

Example:

For example, if you have a car dealership website and you want to have a specific search result for 2008 or newer red sedans show up in the search engines, which normally looks like this:

http://yoursite.com/inventory?year=>2008&body=sedan&color=red

Then your location line in your Sitemap will look like this:

<loc>http://yoursite.com/inventory?year=&gt;2008&amp;body=sedan&amp;color=red</loc>

Where to Put a Sitemap File

A sitemap can only, under normal circumstances, catalog URLs which occur in or under the directory in which it is located. So if you have an online store in a directory named store and wish to catalog both the store pages and your regular website pages, you'll need to place your Sitemap file in the top-most directory you wish to catalog—this is usually your "root web directory". Depending on your server, the name of this directory may be called httpdocs, public_html, wwwroot, www, or something else entirely. Your web hosting provider can tell you what the correct directory name is.

Example:

Instead of placing your sitemap here: http://yoursite.com/store/sitemap.xml

Place it here: http://yoursite.com/sitemap.xml

Before Submitting Your Sitemap—Validation

Before you submit your Sitemap you should validate it, to ensure that you haven't made any errors. This will save you trouble in the long run! You can validate your Sitemap at various locations, including:

Submitting Your Sitemap

There are three methods of submitting your Sitemap to the various search engines. You may choose to use one, two, or all three methods.

Robots.txt File

This is the simplest method for submitting your Sitemap, though you'll need to ensure that search engines at least occasionally visit your site for this to work. If your site has never been visited by a search engine either because it is new or you're just having trouble getting the word out, then you should use the Direct Search Engine Submission method, below.

Add the following line to your robots.txt file, if you already have one, or create a new one with this line.

Sitemap: http://yoursite.com/sitemap.xml

The next time the search engine comes crawling through your site, it will automatically pick up the Sitemap!

HTTP Request

You can submit a Sitemap or Sitemap update by typing in the following directly. Note that some sites may require you to already have an account with them in order to accept input and you should, of course, replace "yoursite.com" with your domain name.

  • Bing: http://www.bing.com/webmaster/ping.aspx?siteMap=yoursite.com/sitemap.xml
  • Ask.com http://submissions.ask.com/ping?sitemap=http://yoursite.com/sitemap.xml
  • Google: http://www.google.com/webmasters/tools/ping?sitemap=http://yoursite.com/sitemap.xml
  • Yahoo: http://search.yahooapis.com/SiteExplorerService/V1/updateNotification?appid=YahooDemo&url=http://yoursite.com/sitemap.xml (change YahooDemo to your App ID)

Direct Search Engine Submission

This is the most time-intensive method of submission as you must go to each search engine individually. You don't necessarily need to do this! Most major search engines will automatically read your robots.txt file to find your Sitemap file.2

  • Google: If you haven't already, create a Google Webmaster Tools Account. Submission through this interface is simple.
  • Yahoo!: Pretty much the same as submitting a site through Google, create an account with the Yahoo Site Explorer.
  • Bing: To submit through Bing you'll need to create an MSN account, if you don't have one, and then submit through the Bing Webmaster Center. Make sure to add the MSN verification meta tag to your site.
  • Ask.com You don't need an account to submit to Ask.com, just type in the following in the address bar: http://submissions.ask.com/ping?sitemap=http%3A//www.yoursite.com/sitemap.xml (where yoursite.com is your domain name).

Sitemap Limits

An individual sitemap file may contain, at most, 50,000 URLs and have a size of 10MB or less. If your site contains more than this, you'll need to use multiple Sitemap files. As this tutorial is for beginners, I am not covering this concept.

Concepts Not Covered

I have not covered the following concepts in this tutorial. If you are interested in pursuing your study further, you may wish to visit the Sitemaps.org Protocol Page.

  • Using Sitemap Index Files to Group Multiple Sitemap Files
  • Other Sitemap Formats — For example, RSS or Atom feeds and text files
  • Using Sitemap files to index multiple sites with different domain names
  • Using special tags to provide Google and other search engines with information about video, images, mobile-specific pages, news pages (for Google News), code, and geographical information
  • Using Sitemap files to specifically exclude pages

1 Google, About Sitemaps — Webmaster Tools Help, http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156184 (retrieved 29 July 2010) Back to text

2 Vivek Pathak, Sitemaps Autodiscovery, http://blog.ask.com/2007/04/sitemaps_autodi.html (11 April 2007) Back to text