🎤 WTSFest Philadelphia up next on October 7th
🎟️ Tickets also on sale for Melbourne, London, & Portland

Back to Knowledge Hub

Next-Level Sitemaps: Advanced Strategies and Creative Uses for SEO

Author: Eleni Tarantou

Last updated: 13/10/2025

A sitemap is like a roadmap for your website.

It helps search engines discover and index your most important pages quickly, and efficiently.

In this article I’ll be sharing real-world tips from my experiences of managing sites with millions of pages – including how to keep sitemaps fresh, the impact of breaking them down into more manageable chunks, using them to strengthen canonical signals, and even how to use them to spy on competitors’ strategies.

What is a sitemap?

A sitemap is a file or webpage that provides a blueprint of your website's structure by listing its pages and content.

Its primary function is to help search engines like Google discover, crawl, and index your site content more efficiently. By outlining the key URLs and their relationships, sitemaps can help to improve both the visibility of a website in search engine results and overall site organization.

Sitemap types

There are two main types of sitemaps: XML sitemaps and HTML sitemaps.

XML sitemaps are designed specifically for search engines and contain detailed metadata such as page update frequency and priority. These can be further specialized into types like image sitemaps, video sitemaps, news sitemaps, and mobile sitemaps, which cater to different content types for better indexing.

HTML sitemaps, on the other hand, are typically designed for humans. They help people navigate a website easily by providing a structured, clickable list of important pages.

Why XML Sitemaps Still Matter

XML sitemaps still matter (big time!), especially for large sites with tens of thousands, hundreds of thousands, or even millions of pages.

They act like a GPS for search engines, making sure no important or hard-to-find pages get lost in the shuffle.

Without XML sitemaps, crawlers might miss newer or deeper pages buried in your site’s maze. Plus, these sitemaps give search engines up-to-date information on which pages have changed and how often, which can help speed up indexing and boost your organic search visibility.

For huge websites (think e-commerce or news portals with millions of pages) well-organized XML sitemaps are essential. They not only help manage massive numbers of URLs by splitting them into smaller chunks but also ensure better crawl efficiency, keep servers from overloading, and help both discovery and indexation.

Google’s Gary Illyes has said: “Decreasing crawling without sacrificing crawl-quality would benefit everyone”, so it’s clear that Google is working towards crawling less frequently, but more efficiently.

As such, if your site is big and busy, XML sitemaps aren't just useful, they are a must-have for staying visible and competitive in search results.

In this article, I'm sharing practical ways to optimize XML sitemaps, especially for websites with a huge number of pages (think anything over 50,000 URLs).

From my experience managing a site with 14 million indexed pages, and through a lot of trial and error, I’ve found which XML sitemap best practices truly make a difference. I also believe XML sitemaps are incredibly versatile tools, perfect for creative use cases, which I’ll elaborate on later in the article.

No-BS XML Sitemap Best Practices

  • Keep your sitemap up-to-date by regularly adding new pages and removing old or deleted ones. Massive website? Creating dynamic sitemaps which automatically update to reflect the latest content changes is crucial. For example, in a Next.js website, you can use the getServerSideProps function to fetch your latest URLs from a database or API and dynamically generate up-to-date sitemap XML on every request. Alternatively, if you use a CMS like WordPress there are some great plugins, which save time and ensure accuracy without manual intervention.
  • Update the last modification date accurately for changed pages.
  • Group URLs logically to keep your sitemaps organized and efficient. For example, an ecommerce site might elect to separate its sitemaps into categories like ‘Products/PDPs,’ ‘Collections/PLPs,’ and ‘Blogposts,’ with each category having its own sitemap file. These individual sitemaps are then linked from a sitemap index. For larger websites, other logical segments often used include grouping sitemaps by: language versions (e.g., English, Spanish, etc), geographic regions or markets (e.g., US, EU) or even update frequency or content freshness (e.g. daily updates vs. static pages). This approach helps keep each sitemap within the size limits and can improve how efficiently search engines crawl your site.
  • Be sure that Google can access your sitemap by hosting it in a publicly accessible location without any login requirements or robots.txt restrictions, and by submitting its URL through Google Search Console to confirm it’s reachable and can be crawled successfully. Extra overlooked step: Also submit in Bing Webmaster Tools!

Add a sitemap link in your robots.txt file, by including a line like the following with the full URL of your sitemap:

Screenshot of a robots.txt file in a code editor window, showing user-agent rules and including a sitemap link to https://www.domain.com/sitemap.xml.

This line can be placed anywhere in the robots.txt file, which is located at the root of your website, to help search engines easily find your sitemap and crawl your site efficiently.

  • Nice-to-have: Use sitemap monitoring tools that regularly check your XML sitemaps for added, removed, or updated URLs and send you email notifications whenever changes are detected. These tools help ensure you stay informed about important website updates affecting SEO and site structure without the need for manual checks. If you're on a tight budget and can’t afford a specialized monitoring tool, you can also use something like Screaming Frog's scheduled crawls.


Creative Sitemap Practices & Use Cases

Beyond serving as a basic directory of pages, I’ve discovered that sitemaps can be powerful tools for testing SEO theories or drawing the search engine bots’ attention to specific URLs.

Here are some things I’ve tried that aren’t often discussed, but proved really useful.

Improve discovery: break the index down into smaller sitemaps

Will breaking up a sitemap into smaller ‘digestible’ chunks help with sitemap processing and speed up page discovery?

This is an idea I was first introduced to on Barry Adam’s blog and decided to test for myself. The official guidelines on Google Search Central state: “All formats limit a single sitemap to 50MB (uncompressed) or 50,000 URLs”, so naturally we had previously set it up to a 50K limit.

But three years ago, I decided to make a change, and create sitemaps with a smaller limit of 30,000 URLs. This took the sitemap index from ~80 XML files to ~130.

This was a pretty successful experiment, as, up to this day, Google is able to process all files within a couple of days (covering 3-4M pages), which is remarkable. In the past, some sitemaps used to remain unprocessed for weeks. So, for bigger websites, I have found that through experience, breaking your sitemaps down can ‘motivate’ the bot to crawl and discover more pages and more often.

Screenshot of a sitemap index, where all sitemap files were successfully processed between the 3rd and 4th of September.

Why do smaller sitemaps work better? My guess is:

  • Sitemaps with fewer entries are much easier and faster for search engines to process and download.
  • Lighter files reduce server load and help bots crawl your site more efficiently. Google won’t wait around for slow servers to deliver huge sitemaps, and parsing 50K URLs in one go is a significant resource drain.
  • Empirically, frequently updated sitemaps get crawled more often, so experimenting with rotating the order of URLs can add a bit of freshness and ‘novelty’ that catches the crawler’s attention.

Strengthen Canonical Signals

Are you seeing index bloat thanks to canonicalized pages? An often overlooked fact is that Google treats your sitemap.xml as a canonical signal.


For clarity, it’s a weak signal, however, if Google doesn't have any strong signals like redirects or rel=canonical tags to fall back on, it will often treat the URLs you list in your XML sitemaps as the canonical versions.

Why is this important? If you can’t influence other canonical signals like redirects or rel=canonical tags, using your XML sitemaps is a good way to at least guide Google toward the canonical version of your pages.

However, it’s important to note that your XML sitemaps can also send conflicting signals.

For example, if you list URLs in your sitemap, but those URLs redirect or canonicalize to different pages, it can confuse search engines by providing mixed instructions.

Your sitemaps not only guide crawling but also influence which URLs Google show in search results. Keeping your sitemaps accurate and aligned with your canonicalization strategy is key to avoiding indexing issues and duplicate content problems.

Conduct Effortless Competitor Analysis

Do you want to quickly find a list of your competitor pages, without having to perform a massive crawl?

Using sitemaps for competitor analysis is a valuable strategy for uncovering how rival websites are structured and optimized for search engines.

Since an XML sitemap essentially provides a roadmap of a website’s content, analyzing it can reveal insights into a competitor’s content priorities, hierarchy, and SEO strategy.

By reviewing your competitors’ sitemaps, you can identify which pages they consider most important, how frequently they update their content, and whether they segment their site into specific categories or topic clusters. This information can help you understand what type of content resonates in your industry, what topics competitors are targeting for organic traffic, and where there may be gaps you can exploit.

For example, using the method allowed me to discover that a competitor’s marketplace indexes specific internal search pages, a tactic I would then later apply to my own client.

Additionally, sitemaps often include details on last modification dates, which helps you track how active competitors are in publishing or refreshing their pages. For example, for e-commerce businesses, a sitemap can reveal insights about specific products, seasonal categories, and landing pages crafted for high-value keywords. By comparing multiple competitors’ sitemaps, businesses can benchmark their own site structure, discover missed keyword opportunities, and better align their publishing schedules with industry trends.

Ultimately, analyzing competitors’ sitemaps provides a behind-the-scenes glimpse of their SEO blueprint, empowering you to build a more informed and competitive digital strategy.


💡How do you quickly find a website's sitemap, I hear you asking? Simply check the robots.txt file (domain.com/robots.txt), or even try appending /sitemap.xml on the home URL (domain.com/sitemap.xml).

Unearth important orphan pages

No time to implement an internal linking strategy for orphan pages with traffic potential? A sitemap can play a crucial role in helping search engines discover and index important orphan pages, (i.e. those web pages that lack any internal links).

Normally, search bots rely on links to navigate a site, so these pages often remain undiscovered, meaning they don’t get indexed and won’t appear in search results, no matter how valuable their content is. By adding them in an XML sitemap, you can directly signal their existence to search engines, which then have a defined path to crawl and potentially index these otherwise hidden assets.

One real-world example from my experience involved translated pages on an international version of a domain, where their equivalent versions were already seeing good traffic. When developer resources were limited and implementing hreflang tags on time was a challenge, a quick and practical workaround was to include these pages in a static sitemap as a temporary solution.

The result? Daily clicks grew x14 and impressions x5 times in a couple of months!

Screenshot of 2 graphs, showing the increase in Clicks and Impressions after Sitemap submission.

Of course, listing orphan pages in the sitemap does not guarantee they’ll be indexed, as search engines may still prioritize well-linked pages, but it significantly increases their visibility compared to leaving them unlisted.

For sites with high-value orphan content, like cornerstone articles or internal search pages, making sure these pages are in the sitemap is an essential step to mitigate crawl and indexing challenges and maximize organic reach. 

Post-migration Temporary Redirect Sitemap

Using a temporary sitemap after a site migration is a powerful way to ensure Google quickly discovers new URLs and forces crawls of old URLs which are now redirected with a 301 status.

This temporary sitemap should list both newly launched pages and, most importantly, the old URLs that redirect, serving two key purposes during migration.

First, it gives Google a direct roadmap to all new content, speeding up inclusion in the index and minimizing organic visibility downtime.

Second, by including deprecated URLs that now 301 redirect, the sitemap encourages Googlebot to revisit and process those redirects quickly, transferring ‘equity’ to the new destinations. After Google detects and follows these redirects through the temporary sitemap, indexation updates and ranking signals are moved, helping avoid lingering old URLs and expediting the de-indexing of replaced content.

And don’t forget: Once the migration is complete and the search index stabilizes, remove the temporary sitemap and revert to the standard version listing only current, canonical URLs to maintain best practices for future crawling and indexing. This method, especially when combined with Google Search Console submissions, is an effective way to minimize post-migration issues and accelerate the update of Google’s index for both redirected and newly introduced pages.

An alternative to the (often overwhelming) html hreflang

Managing hreflang annotations directly in the HTML of large, complex websites can often feel overwhelming and error-prone (hard relate). Switching to sitemaps for hreflang can offer a more robust solution for internationalization. Large sites often face issues like broken or invalid hreflang tags due to manual errors, template problems, or inconsistent updates across thousands of pages. These problems can negatively impact search engines’ ability to serve the correct language and regional versions to users.

By implementing hreflang annotations in XML sitemaps, each language or country version of a page is clearly mapped out, making it easier to maintain, audit, and automate hreflang tags at scale. Search engines fully support this approach and it is less prone to breaking due to simple coding mistakes. Additionally, all page variants are grouped in one place, streamlining international SEO efforts while drastically reducing risk and management headaches for site owners.

For enterprise multilingual sites, migrating hreflang annotations to sitemaps simplifies workflows, improves accuracy, and helps ensure global audiences receive the right content in search, without relying on fragile or repetitive HTML markup.


💡I recently saw a great brightonSEO talk on using this method by Dentsu’s Nick Samuel.


Don’t forget your videos!

Do you have original content in the form of videos on your domain? Having a video sitemap is increasingly important now, especially as LLMs and AI-driven search technologies evolve to better understand and index video content.

Unlike traditional search engines that mainly interpret text, LLMs leverage structured data like video sitemaps to discover, crawl, and analyze video metadata (such as titles, descriptions, thumbnails, and URLs) allowing for more precise indexing and retrieval of video materials. This enhancement enables videos to be converted into text-based formats that LLMs can process, fuelling natural language queries that include video content and improving video discoverability and engagement.

With rising AI adoption, videos are no longer isolated pieces of content, but part of the richer datasets that LLMs use to generate responsive answers. A dedicated video sitemap ensures your video content is represented clearly to these types of crawlers, increasing the likelihood your videos appear in search results and AI-generated responses.

For businesses and content creators, this means using video sitemaps can significantly boost visibility in the era of AI and LLM-powered search experiences, making it a crucial SEO tactic moving forward.

Common Sitemap Mistakes

Pointing to too many irrelevant or invalid pages

Having a large number of non-indexable or non-canonical pages in your sitemap can cause Google to lose trust in that sitemap’s quality and reliability.

When a sitemap is filled with URLs that are blocked by robots.txt, marked with noindex tags, redirected (non-canonical), or even soft-404s, it signals to Google that the sitemap is not well maintained or reflective of the true site structure. This reduces Google’s confidence in the sitemap as a trustworthy source to guide crawling and indexing.

If Google detects that many URLs in a sitemap are irrelevant or invalid, it may reduce the crawl budget allocated to the site, crawl it less frequently, or ignore the sitemap altogether, thereby diminishing its effectiveness.

Maintaining a clean sitemap with mostly canonical, accessible, and indexable URLs is important to build and maintain Google's trustworthiness assessment of your site, which aligns with their broader E-E-A-T principles emphasizing trust and reliability for better ranking performance.

Ultimately, keeping your sitemap accurate and relevant is critical to ensuring search engines can efficiently crawl and index your site while fostering trust in your site’s quality signals.

💡It's perfectly acceptable to deviate from these strict sitemap cleanliness standards temporarily for testing purposes (per previous examples), as long as you monitor the impact closely and revert to a clean, canonical URL-focused sitemap once testing is complete.

Using <lastmod> too freely

Using the <lastmod> tag in your sitemap too freely or inaccurately, especially on pages that don’t get updated often, can lead to inefficiencies and confusion for search engines. The <lastmod> element is intended to signal when a page was last substantially modified, helping search engines prioritize crawling.

However, if <lastmod> dates are frequently changed without actual content updates, or if the tag is used on mostly static pages, it can mislead search engines into revisiting unchanged pages unnecessarily, which might waste crawl budget and reduce overall crawl efficiency.

Using the deprecated <changefreq> and <priority> tags

Traditional optional sitemap tags like <changefreq> and <priority> are generally ignored by Google today (although still supported in the protocol), making these tags effectively useless for influencing crawl behavior or rankings on Google.

Final thoughts

I’m hoping that having read this, I’ve convinced you that XML sitemaps are not only important, they can also be used in a variety of ways: from competitor analysis to running your own SEO tests. The key for me has been to keep experimenting, and observing how search engines respond, and to view my sitemaps as tools, rather than just files.

Eleni Tarantou - SEO Specialist

Eleni is an SEO Specialist for skroutz.gr, the leading marketplace in Greece. She has worked both in-house and agency side and has a special place in the heart for all things technical and e-commerce SEO.

WTSKnowledge Sponsor

Profound helps brands win in generative search. Monitor and optimize your brand's visibility in real-time across ChatGPT, Perplexity, Google AI Overviews, Microsoft Co-pilot, Grok, Meta AI, & DeepSeek.