Duplicate content is when the same, or very similar content is accessible via multiple URLs. These duplicate pages are inadvertently created in a number of ways - for example, having both HTTP and HTTPS versions of page; www, and non-www, page versions; UTM parameters; pagination series; and more.
Duplicate pages can cause problems because search engines often struggle to determine which version of the page ought to be indexed and shown to users in the SERPs.
How do we navigate these challenges and signal to search engines which pages to prioritize?
One technique we’ve relied on over the years for this purpose is canonicalization.
Image of duplicate dog content
Duplicate content is when the same (or very similar content) is accessible via more than one indexable URL. Below you’ll find a list of common issues which cause these duplicate pages to be created:
HTTP and HTTPS e.g.: http://www.example.com/ and https://www.example.com/
www and non-www e.g.: https://www.example.com/ and https://example.com/
Trailing slash vs non-trailing slash e.g.: https://www.example.com/ and https://www.example.com
A mobile version and a desktop version e.g.: m.example.com and example.com
On many websites, duplicate versions of a single web page may exist, and be indexable. In the context of SEO, canonicalization is the process of signaling your preferred version of the page - i.e. the one that you want search engines to show to users.
If you’re seeing different page versions ranking better than your desired page, implementing the canonical link element will help resolve the situation.
The canonical link element, or the canonical tag, is a code snippet placed on the head section of an html page to indicate to googlebot your preferred version of the page.
Here’s what the code looks like on a website:
Search engines have the ability to detect and ignore duplicates, so why do we need to worry about this? Well, the simple answer here is one of control: sometimes the search engines get this right, but often they don’t!
Imagine you work on an e-commerce website that sells dog harnesses in various colors. Individual pages have been created for each color variant, resulting in 10 different URLs for dog harnesses. The problem is, when someone searches for "dog harness" or related queries, all 10 URLs are competing to appear for those queries in the search results.
I love to compare the concept of canonicalization to a scenario where you bring a large cake to a school with a group of children eagerly awaiting its arrival. As soon as they see you, they all rush towards the cake, trying to grab a piece. In the chaos, some portions of the cake fall off, some children end up with more cake than others, and unfortunately, some children don't get any cake at all.
Now, imagine you’ve appointed someone to handle the cake distribution. This person accepts the cake on behalf of the group and carefully ensures that each child receives a fair and equal portion. By organizing and distributing the cake in a controlled manner, the chaos and unfairness can be avoided.
In the context of your website, canonicalization plays a similar role. When you have duplicate pages, it can lead to confusion for the search engines: some pages might be ignored, others might be given more prominence, and some might not appear in the search results at all.
Canonicalization steps in to bring order to this situation. It ensures that any duplicate pages are properly identified, and guides search engines to recognize and display a single preferred page in the search results. By designating the preferred page through canonical tags, you establish clarity and provide search engines with clear guidance on which version of the content to prioritize.
When Google encounters multiple pages that seem similar during indexing, it chooses a page as the canonical . But how does Google determine this? According to Google documentation these are some of factors they consider:
Optimizing our preferred pages to align with Google's preferences will increase the likelihood that our desired canonical pages are chosen.
The canonical element is the easiest way to tell Google that there is a version of this page that you’d like to be indexed. Including <link rel=”canonical”href=https://www.example.com/> in the HTML header of a page acts as a signal, but because the canonical tag is a hint, not a directive, sometimes, (as illustrated in the screenshot below), it is ignored:
Screenshot of GSC, Canonical Report, June 2023
How do you resolve issues like this? Follow the steps below to send a stronger signal to Google:
All these canonicalization factors come together to tell Google which URL you prefer and help demonstrate that the canonical link elements on your pages are deliberate and should be honored.
How do you know if your site has canonicalization issues?
Follow the steps below:
Screenshot of Screaming Frog Canonical Report, June 2022
Screenshot of Google Sheet Canonical Data, June 2022
Screenshot of Google Sheet Canonical Data, June 2022
Screenshot of GSC, Page Indexing Report, June 2022
Screenshot of GSC URL Inspection Report, June,2022
Image of a Canonicalization Audit Flowchart
You can implement rel=”canonical” on your site using one of these two methods:
Include a canonical link element on the header section of your duplicate HTML pages with the URL pointing to the preferred version. Here is an example:
<html>
<head>
<title>Red Dog Harnesses</title>
<!-- other elements -->
</head>
<!-- rest of the HTML →
Use absolute paths when implementing the canonical tag, rather than relative paths. For example, use: https://www.dogstore/dog-harness/red-harness/ not dogstore/dog-harness/red-harness/.
While it's generally not advisable to implement a canonical link element in JavaScript, if you have limited options, you can do this. But be cautious here, incorrect implementation may lead to site-wide canonical issues and unexpected results. If you choose to implement JavaScript-based canonicalization, follow this guide to ensure accurate implementation.
For documents like the PDFs, XLX, Word documents, images, or videos, you’ll need to implement the canonical or X-robot tag in the HTTP header rather than on your HTML page. This method requires access to your server configuration file. Here is an example:
HTTP/1.1 200 OK
Content-Length: 19
...
When a canonical tag points to a URL that has another canonical or redirect it creates a canonical chain. For example, if the “dog-harness” page is set as the canonical for the “red-harness” page, but the “dog-harness” page is redirected or has its own canonical pointing to the “dog-products” page it creates a conflicting signal and Google may abandon the hint altogether. This issue can occur due to improper injection of canonical with JS or errors in CMS plugins. Periodic canonical audits can help you catch and fix issues like this.
Information about your page including canonical tags, should be added to the head section. Canonical tags in the body section will be ignored by Google. Checking where your canonical tags appear will help you identify if any elements such as iframes or unclosed tags have accidentally pushed the canonical link element into the body section of your pages.
Many people elect to canonicalize paginated pages in order to avoid competition, as they often have similar content. However, doing so can hinder the discovery and indexing of the individual pages which are linked to from the pagination series. If this is a concern, it’s advisable to implement self-referencing canonical tags on paginated pages instead.
If you set a canonical tag for a particular page, but then include include a different variant of that page in the sitemap, or link internally to this variant, it sends a mixed signal and could cause search engines to think: “Oh, they’re definitely confused, I’ll ignore the canonical tag and select the canonical version myself”. Maintaining consistency emphasizes your intention to the search engines.
While redirects and canonical tags are both canonicalization signals, they serve different purposes and shouldn’t be used interchangeably. When you want to permanently merge two pages into one and make them accessible via a single URL, a 301 redirect is the appropriate choice. On the other hand, if you want to consolidate similar pages but keep them individually accessible while indicating a preferred version for search results, using the
If Google obeys the canonical hint, it will consolidate the pages including things like link equity, and index the canonical page specified. However, this doesn’t mean that the canonicalized page isn’t indexable. The only way to ensure a page does not appear in the search results is to implement a no-index tag.
Hreflang helps signal to search engines which URL you want to show users in different locations or searching in different languages.
Imagine you have an
While commenting on a canonical vs no-indexing question during Google SEO office-hours hangout, Google’s John Muller said:
“…you can also do both of them. [And it’s something…] if external links, for example, are pointing at this page then having both of them there kind of helps us to figure out well, you don’t want this page indexed but you also specified another one. So maybe some of the signals we can just forward along.”
However, it might be preferable not to risk sending conflicting signals to Google. You can watch the full video of John Muller’s response here.
Audit the page as outlined earlier in this article in order to identify any possible errors that could be causing Google to ignore the hint, and consider implementing one or more of the tips above to help send a stronger signal.
Ebere is a Freelance Technical SEO Specialist who enjoys spending time finding technical opportunities that improve website rankings and traffic. She has worked agency-side and audited websites in different sectors including e-commerce, healthcare and travel.