Last Updated: September 7th, 2022 Show
Duplicate Content Best PracticesGoogle DOES NOT have a duplicate content penalty. Google rewards unique content and the signals associated with added value. Google filters duplicate content in SERPS. Google DEMOTES copied content in SERPS. Google DEMOTES manipulative duplicated content in SERPS. Google PENALISES low-quality verbose spun content copied from other web pages. Do NOT expect to rank high in Google with content found on other, more trusted sites.
Sign up for our Free SEO training course to find out more.
Copied ContentTLDR: ‘Duplicate content‘ is NOT mentioned **once** in the recently published Search Quality Raters Guidelines. ‘Copied content’, is. Semantics aside, duplicate content is evidently treated differently by Google than copied content, with the difference being the INTENT and nature of the duplicated text. Duplicated content is often not manipulative and is commonplace on the web and often free from malicious intent. It is not penalised, but it is not optimal. Copied content can often be penalised algorithmically or manually. Don’t be ‘spinning’ ‘copied’ text to make it unique! Google clearly says that the practice of making your text more ‘unique’, using low-quality techniques like adding synonyms and related words is:
Google’s Andrey Lipattsev is adamant: Google DOES NOT have a duplicate content penalty. He clearly wants people to understand it is NOT a penalty if Google discovers your content is not unique and doesn’t rank your page above a competitor’s page.
Also, as John Mueller points out, Google picks the best option to show users depending on who they are and where they are. So sometimes, your duplicate content will appear to users where relevant. This latest advice from Google is useful in that it clarifies Google’s position, which I quickly paraphrase below:
A sensible strategy for SEO would still appear to be to reduce Googlebot crawl expectations and consolidate ranking equity & potential in high-quality canonical pages and you do that by minimising duplicate or near-duplicate content. A self-defeating strategy would be to ‘optimise’ low-quality or non-unique pages or present low-quality pages to users. Webmasters are confused about ‘penalties’ for duplicate content, which is a natural part of the web landscape, because Google claims there is NO duplicate content penalty, yet rankings can be impacted negatively, apparently, by what looks like ‘duplicate content’ problems. The reality is that if Google classifies your duplicate content as THIN content, or MANIPULATIVE BOILER-PLATE or NEAR DUPLICATE ‘SPUN’ content, then you probably DO have a severe problem that violates Google’s website performance recommendations and this ‘violation’ will need ‘cleaned’ up – if – of course – you intend to rank high in Google. Google wants us to understand that MANIPULATIVE BOILER-PLATE or NEAR DUPLICATE ‘SPUN’ content is NOT ‘duplicate content’. Duplicate content is not necessarily ‘spammy’ to Google. Sign up for our Free SEO training course to find out more. The rest of it is e.g:
At the ten minute mark in a recent video, John Mueller of Google also clarified, with examples, that there is:
What Is Duplicate Content?Here is a definition from Google:
It’s crucial to understand that if, as a Webmaster, you republish posts, press releases, news stories or product descriptions found on ***other*** sites, then your pages are very definitely going to struggle to gain traction in Google’s SERPs (search engine results pages). Google doesn’t like using the word ‘penalty’ but if your entire site is made of entirely of republished content – Google does not want to rank it above others who provide more of a ‘value add’ – and that can be in many areas. If you have a multiple site strategy selling the same products – you are probably going to cannibalise your traffic in the long run, rather than dominate a niche, as you used to be able to do. This is all down to how a search engine filters duplicate content found on other sites – and the experience Google aims to deliver for it’s users – and it’s competitors. Mess up with duplicate content on a website, and it might look like a penalty as the end-result is the same – important pages that once ranked might not rank again – and new content might not get crawled as fast as a result. Your website might even get a ‘manual action’ for thin content. A good rule of thumb is; do NOT expect to rank high in Google with content found on other, more trusted sites, and don’t expect to rank at all if all you are using is automatically generated pages with no ‘value add’. While there are exceptions to the rule, (and Google certainly treats your OWN duplicate content on your OWN site differently), your best bet in ranking is to have one single (canonical) version of content on your site with rich, unique text content that is written specifically for that page. Google wants to reward RICH, UNIQUE, RELEVANT, INFORMATIVE and REMARKABLE content in its organic listings – and it has raised the quality bar over the last few years. If you want to rank high in Google for valuable key phrases and for a long time – you better have good, original content for a start – and lots of it. A very interesting statement in a recent webmaster hangout was “how much quality content do you have compared to low-quality content“. That indicates Google is looking at this ratio. John says to identify “which pages are high-quality, which pages are lower quality so that the pages that do get indexed are really the high-quality ones.“ Gary IILyes chipped in recently –
Google is giving us a lot more specific information these days in particular areas.
A question was asked in a webmaster hangout and John replied:
And
And, here is where it gets trickier:
And when it gets spammy:
And finally:
Sign up for our Free SEO training course to find out more. Is There A Penalty For Duplicate Content On A Website?Google has given us some explicit guidelines when it comes to managing duplication of content. John Mueller clearly states in the video where I grabbed the above image:
and
…in which he was talking about very similar pages. John says to “provide… real unique value” on your pages. I think that could be understood that Google is not compelled to rank your duplicate content. If it ignores it, it’s different from a penalty. Your original content can still rank, for instance. If “essentially, they’re the same, and just variations of keywords” that should be ok, but if you have ‘millions‘ of them- Googlebot might think you are building doorway pages, and that IS risky. Generally speaking, Google will identify the best pages on your site if you have a decent on-site architecture and unique content. The advice is to avoid duplicate content issues if you can and this should be common sense. Google wants (and rewards) original content – it’s a great way to push up the cost of SEO and create a better user experience at the same time. Google doesn’t like it when ANY TACTIC it’s used to manipulate its results, and republishing content found on other websites is a common practice of a lot of spam sites.
You don’t want to look anything like a spam site; that’s for sure – and Google WILL classify your site… as something. The more you can make it look a human-made every page on a page by page basis with content that doesn’t appear exactly in other areas of the site – the more Google will ‘like’ it. Google does not like automation when it comes to building the main content of a text-heavy page; that’s clear. I don’t mind multiple copies of articles on the same site – as you find with WordPress categories or tags, but I wouldn’t have tags and categories, for instance, and expect them to rank well on a small site with a lot of higher quality competition, and especially not targeting the same keyword phrases in a way that can cannibalise your rankings. I prefer to avoid repeated unnecessary content on my site, and when I do have 100% automatically generated or syndicated content on a site, I tell Google NOT to index it with a noindex in meta tags or XRobots or Robots.txt it out completely. I am probably doing the safest thing, as that could be seen as manipulative if I intended to get it indexed. Google won’t thank you, either, for spidering a calendar folder with 10,000 blank pages on it, or a blog with more categories than original content – potentially a lot of thin pages – why would they? Sign up for our Free SEO training course to find out more. Can You Duplicate Your Own Content Within Your Own Site?Yes, within reason but when you duplicate text on more than one page on your site you give Google the chance to rank multiple pages for the same query and some pages rank better than others, thus negatively impacting your organic traffic levels over the longer term. Some much larger sites reuse unique content on product and multiple category pages, for instance. Google will not penalise this practice but:
How Does Google Work Out The Primary Version Of Duplicate Content?The following statement from a fellow SEO rings true on some levels evidently:
There is an interesting comment on that page too:
It may not be an entirely accurate statement nor the complete the picture but it is very interesting. Some Google patents at least indicate some amount of thought has been put into determining the primary version of duplicated content across many sites and it appears is not limited to link authority.
I have long thought it sensible to publish to your own site, making your own the site the canonical or primary source of content you publish. Publish duplicate content to other sites to get it noticed, for sure, but where allowed link back to the original article and even better use a canonical link element to point back to your original article on your own site and help Google forward legitimate positive signals along to you (if Google wants to, that is). Sign up for our Free SEO training course to find out more. What is Boilerplate Content?Wikipedia says of ‘boilerplate’ content:
…and Google says to:
Google is very probably looking to see if your pages ‘stand on their own‘ – as John Mueller is oft fond of saying. How would they do that algorithmically? Well, they could look to see if text blocks on your pages were unique to the page, or were very similar blocks of content to other pages on your site. If this ‘boilerplate’ content is the content that makes up the PRIMARY content of multiple pages – Google can easily filter to ignore – or penalise – this practice. The sensible move would be to listen to Google – and minimise – or at least diffuse – the instances of boilerplate text, page-to-page on your website. Note that THIN CONTENT exacerbates SPUN BOILERPLATE TEXT problems on a site – as THIN CONTENT just creates more pages that can only be created with boilerplate text – itself, a problem. E.G. – if a product has 10 URLs – one URL for each colour of the product, for instance – then the TITLE, META DESCRIPTION & PRODUCT DESCRIPTION (and other elements on the page) for these extra pages will probably rely on BOILERPLATE techniques to create them, and in doing so – you create 10 URLs on the site that do ‘not stand on their own’ and essentially duplicate text across pages. It’s worth listening to John Mueller’s recent advice on this point. He clearly says that the practice of making your text more ‘unique’, using low-quality techniques is:
If you have many pages of similar content your site, Google might have trouble choosing the page you want to rank, and it might dilute your capability to rank for what you do what to rank for. How Does Google Deal With Duplicate Product Descriptions Across Multiple Retailer Sites?
John Mueller responded:
Should I Rewrite Product Descriptions To Make The Text Unique?Probably. Whatever you do, beware ‘spinning the text’ – Google might have an algorithm or two focused on that sort of thing: John has also clarified:
Sign up for our Free SEO training course to find out more. Is There A Google Penalty For Spun Content?John Muller confirmed in September 2019 that indeed Google will penalise spun content if generating “textual pages“. I would think that means MC (the Main Content of a page).
Does Having Different Urls For A Product Attribute (Like Size, Amount, Weight, Colour, Volume) Cause Duplicate Content Problems?
In the recent past, a product with 10 colours would end up having 10 pages on the site, one for each colour variant attribute. Looking at the pages, everything would be duplicate page to page, apart from the colour, size or another variant attribute of the product. Webmasters went a few steps further in an attempt to make each page “unique”…. manually or automatically keyword spinning product variations and entire descriptions to rank in Google, but adding no real value add to the page. For many sites, Google prefers not to rank pages like that, instead, it prefers one canonical detailed product page, with all variant attributes mentioned on that page, UNLESS you have unique content for the variant page that is not just low-quality spun text.
John says:
It’s not so much a duplicate content penalty, it’s that multiple variant pages of the same product dilute ranking signals in a way that one canonical product page would not, and that variant product pages are often not the optimal set-up to rank in Google. A good rule of thumb is that if you have a lot of pages that look duplicate to you apart from one or two items on the page, then those pages should probably be “folded together” into one strong page.
John continues:
and
and
Ecommerce SEO Tip: Create and publish strong, detailed, product pages and do not split up the product content into multiple alternative, indexable product variant URLs to facilitate ranking for every individual variant attribute of the product, be it colour, size, weight or volume. One page should rule them all, so to speak. How Does Google Rate ‘Copied‘ Main Content?
How To Manage Content Spread Across Multiple DomainsThis is a good video (note it has somewhat outdated information about cross-domain rel canonical) Matt Cutts updates advice on sing cross-domain rel canonical in the following video: If you have content spread amongst multiple domains, do not expect to get all the versions appearing in Google SERPs at the same time. This sort of duplicate content is not going to improve quality scores, either.
If you are following the rules whilst duplicating content across multiple domains, I would pick one canonical url on one website (the primary website) and use cross-domain canonical link elements to tell Google which is the primary URL. This way you meet Google’s guidelines, site quality scores should not be impacted negatively and you consolidate all ranking signals in one of these URLs so it can rank as best it can against competing pages. Sign up for our Free SEO training course to find out more. How To Deal With Content Spread Across Multiple TLDs?
Does Google Treat Translated Content as Duplicated Content?
No. Remember, though, to specify language alternatives using the hreflang attribute.
Sign up for our Free SEO training course to find out more. How To Manage Duplicate Content When Reporting News Stories
What is ‘Near-Duplicate’ Content, to Google?When asked on Twitter, Googler Gary Illyes responded:
Based on research papers, it might be the case that once Google detects a page is a near duplicate of something else, it is going to find it hard to rank this page against the source. Can Duplicate Content Rank in Google?Yes. There are strategies where this will still work, in the short term. Opportunities are (in my experience) reserved for local and long tail SERPs where the top ten results page is already crammed full of low-quality results, and the SERPs are shabby – certainly not a strategy for competitive terms. There’s not a lot of traffic in long tail results unless you do it en-mass and that could invite further site quality issues, but sometimes it’s worth exploring if using very similar content with geographic modifiers (for instance) on a site with some “domain authority” (for want of a better word) has the opportunity. Very similar content can be useful across TLDs too. A bit spammy, but if the top ten results are already a bit spammy… If low-quality pages are performing well in the top ten of an existing long tail SERP – then it’s worth exploring – I’ve used it in the past. I always thought if it improves user experience and is better than what’s there in those long tail searches at present, who’s complaining? It not exactly best practice SEO and I’d be nervous about creating any low-quality pages on your site these days. Too many low-quality pages might cause you site-wide issues in the future, not just page level issues. Original Content Is King, they say Stick to original content, found on only one page on your site, for best results – especially if you have a new/young site and are building it page by page over time… and you’ll get better rankings and more traffic to your site (affiliates too!). Yes – you can be creative and reuse and repackage content, but I always make sure if I am asked to rank a page I will require original content on the page. Should I Block Google From Indexing My Duplicate Content?No. There is NO NEED to block your own Duplicate Content There was a useful post in Google forums a while back with advice from Google how to handle very similar or identical content:
John also goes on to say some good advice about how to handle duplicate content on your own site:
Webmaster guidelines on content duplication used to say:
but now Google is pretty clear they do NOT want us to block duplicate content, and that is reflected in the guidelines.
You want to minimise dupe content, rather than block it, I find the best solution to handling a problem is on a case by case basis. Sometimes I will block Google when using OTHER people’s content on pages. I never block Google from working out my own content. Google says it needs to detect an INTENT to manipulate Google to incur a penalty, and you should be OK if your intent is innocent, BUT it’s easy to screw up and LOOK as if you are up to something fishy. It is also easy to fail to get the benefit of proper canonicalisation and consolidation of relevant primary content if you don’t do basic housekeeping, for want of a better turn of phrase. Sign up for our Free SEO training course to find out more. Is A Mobile Site Counted As Duplicate Content?
How Does Google Pick A Canonical URL For Your Page?In September 2019 in the video above, Google’s John Mueller very recently aimed again to clarify how Google chooses a canonical URL from all the duplicate variant URLs available to it when it crawls your website. and offers some advice on how to help Google choose a canonical URL for your page:
How To Use Canonical Link Elements ProperlyThe canonical link element is extremely powerful and very important to include on your page. Every page on your site should have a canonical link element, even if it is self referencing. It’s an easy way to consolidate ranking signals from multiple versions of the same information. Note: Google will ignore misused canonicals given time. Google recommends using the canonical link element to help minimise content duplication problems and this is of the most powerful tools at our disposal.
Google SEO – Matt Cutts from Google shared tips on the rel=”canonical” tag (more accurately – the canonical link element) that the 3 top search engines now support. Google, Yahoo!, and Microsoft have all agreed to work together in a:
Example Canonical Tag From Google Webmaster Central blog: <link rel="canonical" href="http://www.example.com/product.php?item=swedish-fish" /> The process is simple. You can put this link tag in the head section of the duplicate content URLs if you think you need it. Should pages have self-referencing Canonical link elements?
I add a self-referring canonical link element as standard these days – to ANY web page – to help work Google work out exactly which is the canonical url I am trying to rank. Google, 2020 has offered us some advice on properly using canonicals: Is rel=”canonical” a hint or a directive?
Can I use a relative path to specify the canonical, such as <link rel=”canonical” href=”product.php?item=swedish-fish” />?
Is it okay if the canonical is not an exact duplicate of the content?
What if the rel=”canonical” returns a 404?
What if the rel=”canonical” hasn’t yet been indexed?
What if I have contradictory rel=”canonical” designations?
Can this link tag be used to suggest a canonical URL on a completely different domain?
Canonical Link Elements can be ignored by Google:
Can rel=”canonical” be a redirect?
Canonical link elements can be treated as redirects
Tip – Redirect old, out of date content to new, freshly updated articles on the subject, minimising low-quality pages and duplicate content while at the same time, improving the depth and quality of the page you want to rank.
Tips from Google As with everything Google does – Google has had its own critics about its use of duplicate content on its own site for its own purposes: There are some steps you can take to proactively address duplicate content issues, and ensure that visitors see the content you want them to:
I would also ensure your links are all the same case and avoid capitalisation and lower case variations of the same URL. This type of duplication can be quickly sorted keeping internal linking consistent and proper use of canonical link elements.
Google also tells Webmasters to choose a preferred domain to rank in Google:
…although you should ensure you handle such redirects server side, with 301 redirects redirecting all versions of a URL to one canonical URL (with a self-referring canonical link element).
Understand If Your CMS Produces Thin Content or Duplicate PagesGoogle says:
WordPress, Magento, Joomla, Drupal – they all come with slightly different SEO, duplicate content (and crawl equity performance) challenges. For example, if you have ‘PRINT-ONLY’ versions of web pages (Joomla used to have major issues with this), that can end up displaying in Google instead of your web page if you’ve not handled it properly with canonicals. That’s probably going to have an impact on conversions and link building – for starters. Poorly implemented mobile sites can cause duplicate content problems, too. I would watch out for building what can look like ‘doorway pages’ to Google by creating too many keywords, tags or category pages. Sign up for our Free SEO training course to find out more. Will Google Penalise You For Syndicated Content?No. When it comes to publishing your content on other websites:
The problem with syndicating your content is you can never tell if this will ultimately cost you organic traffic. If it is on other websites – they might be getting ALL the positive signals from that content – not you. It’s also worth noting that Google still clearly says that you CAN put links back to your original article in posts that are republished elsewhere. But you need to be careful with that too – as those links could be classified as unnatural links. The safest way to handle this is to ask the other site that republished your content to add a rel=canonical pointing to your original article on your site. Then your site gets the entire SEO benefit of the act of republishing your content, instead of the other site. Links In duplicate articles do count but are risky. A few years ago I made an observation I think that links that feature on duplicate posts that have been stolen – duplicated and republished – STILL pass anchor text value (even if it is a slight boost). In this example, my ‘what is SEO’ post was stripped out all my links and published the article as his own. Well, he stripped out all the links apart from one link he missed: Yes, the link to http://www.duny*.com.pk/ was actually still pointing to my home page. This gave me an opportunity to look at something….. The article itself wasn’t 100% duplicate – there was a small intro text as far as I can see. It was clear by looking at Copyscape just how much of the article is unique and how much is duplicate. So this is was 3 yr. old article republished on a low-quality site with a link back to my site within a portion of the page that’s clearly duplicate text. I would have *thought* Google just ignored that link. But no, Google did return my page for the following query (at the time): The Google Cache notification (below) is now no longer available, but it was a good little tool to dig a little deeper into how Google works: … which indicated that Google will count links (AT SOME LEVEL) even on duplicate articles republished on other sites – probably depending on the search query, and the quality of the SERP at that time (perhaps even taking into consideration the quality score of the site with the most trust?). I have no idea if this is the case even today. Historically, syndicating your content via RSS and encouraging folk to republish your content got your links, that counted, on some level (which might be useful for long tail searches). Google is quite good at identifying the original article especially if the site it’s published on has a measure of trust – I’ve never had a problem with syndication of my content via RSS and let others cross post…. but I do like at least a link back, nofollow or not. The bigger problem with content syndication is unnatural links and whether or not Google classifies your intent as manipulative. If Google does class your intent to rank high with unnatural links, then you have a much more serious problem on your hands. Sign up for our Free SEO training course to find out more. Does Google Penalise ‘Thin’ Content On A Website?Yes. Google also says about ‘thin’ content.
and
The key things to understand about duplicate content on your web pages are:
They also have Google Panda, an algorithm specifically designed to weed out low-quality content on websites. Sign up for our Free SEO training course to find out more. Minimise Any Series Of Paginated Pages On Your Site
Does Google Use Pagination Markup?Some confusion arose during early 2019 as to whether using pagination markup was a worthwhile endeavor for webmasters to implement. It stemmed from Google removing its own help center documentation that laid down best practices for Pagination mark-up and John Mueller’s illuminating comment on Twitter that:
Other Googlers were quick to clarify the comment:
and Bing also clarified they use it, but again, not in “the ranking model”, somewhat adding to what Google’s John Mueller indicated with his tweet.
I think the simple answer here is…. continue using rel=next and rel=previous mark-up. That is what I do, anyway.
How To Deal With Pagination Problems On Your WebsitePaginated pages are not duplicate content, but often, it would be more beneficial to the user to land on the first page of the sequence. Folding pages in a sequence and presenting a canonical URL for a group of pages has numerous benefits. If you think you have paginated content problems on your website, it can be a frightening prospect to try and fix. It is actually not that complicated.
While Google says you can ‘do nothing‘ with paginated content, that might be taking a risk in a number of areas, and part of SEO is to focus on ranking a canonical version of a URL at all times. What you do to handle paginated content will depend on your circumstances. A better recommendation on offer is to:
and
You can also use meta robots ‘noindex,follow‘ directions on certain types of paginated content (I do), however, I would recommend you think twice before actually removing such content from Google’s index IF those URLs (or a portion of those URLs) generate a good amount of traffic from Google, and there is no explicit need for Google to follow the links to find content. If a page is getting traffic from Google but needs to come out of the index, then I would ordinarily rely on an implementation that included the canonical link element (or redirect). Ultimately, this depends on the situation and the type of site you are dealing with. How To Use Rel=Next & Rel=Previous Markup, ProperlyYou do not need to implement Rel=Next/Prevous Markup, but it is a W3C standard, so feel free to use it if you want. Pagination can be a tricky concept and it is easy to mess up. Here are some notes to help you also:
You ONLY use rel=“canonical” to point to a VIEW ALL Page if one is present, OTHERWISE, all pages SHOULD have a SELF-referencing canonical tag.
Common mistake web developers make is to add a rel=canonical to the first page in the series or to add NOINDEX to pages in the series of a component set.
and
A Google spokesperson explains why this is not optimal here: https://youtu.be/njn8uXTWiGg?t=11m52s RE: How To Handle Time-sequential Series of Pages (e.g. in a blog) Google offers this advice: QUESTION:
ANSWER:
So – for internal pages that are ordered by date of publishing, it is probably better to just let Google crawl these. Sign up for our Free SEO training course to find out more. How Does Googe Rate Content ‘Deliberately Duplicated Across Domains‘?It can see it as manipulative:
If you are trying to compete in competitive niches, you need original content that’s not found on other pages in the same form on your site, and THIS IS, EVEN MORE, IMPORTANT WHEN THAT CONTENT IS FOUND ON OTHER PAGES ON OTHER WEBSITES. Google isn’t under any obligation to rank your version of content – in the end, it depends on who’s site has got the most domain authority or most links coming to the page. Well, historically at least – it is often the page that satisfies users the most. If you want to avoid being filtered by duplicate content algorithms, produce unique content. Should You Block Google from Crawling Internal Search Result Pages?Yes. According to Google. Google wants you to use Robots text to block internal search results. Google recommends “not allowing internal search pages to be indexed”. While there are ways around this guideline that do not produce ‘infinite search spaces”, letting Google index and rank your internal search pages is a VERY risky manoeuvre (over time) if you are in a competitive industry. These recommendations are actually in webmaster guidelines.
Letting Google crawl and index your internal search results pages is an ‘inefficient’ from a crawling and indexing. Such pages cause “problems in search” for Google, and Google has a history of ‘snapping back’ on companies who break such guidelines to their profit. Sign up for our Free SEO training course to find out more. TIP: “noindex, follow” “is essentially kind of the same as a” “noindex, nofollow” John MuellerMany use NOINDEX,FOLLOW on such pages (blog sub pages, date based archives etc) to remove them from the index by a recent talk from John Mueller would indicate a change in how Google treated noindexed links and the attribute ‘follow’ in the meta robots instruction.
Don’t link to often from your own sites internal links to pages that are no-indexed. Further reading: https://webmasters.googleblog.com/2012/03/video-about-pagination-with-relnext-and.html Are Uppercase and Lowercase URLs Counted as TWO different pages to Google?Yes. Uppercase and lowercase versions of a URL are classed as TWO different pages for Google. Best practice has long been to force lowercase URLs on your server, and be consistent when linking to internal pages on your website and use only lowercase URLs when creating internal links. The video below offers recent (2017) confirmation of this challenge – with the advice being to use canonicals or redirects to fix this issue, and this would be whatever was more efficient from a crawling and indexing perspective (which I think to be 301 redirects in this instance, where necessary, and a overhaul of the internal linking structure): …and when asked recently (2017) on Twitter, former Googler Matt Cutts replied:
Matt replied:
Do Trailing Slashes Cause Duplicate Content Challenges on a Website?Sometimes. It depends on whether the trailing slashes are on internal pages on a site, or on the root, and which protocol is being used. Google clarified on whether or not forgetting to add trailing slashes on a website URL causes problems on your site:
He offered a guide to help fix the common issues: Note – I aim to use a trailing slash in almost all cases to help ensure consistency of canonical URLs across a site when it comes to internal linking or external link building. Redirect Non-WWW To WWW (or Vice Versa)
Your site probably has canonicalisation issues (especially if you have an e-commerce website) and it might start at the domain level and this can exacerbate duplicate content problems on your website. Simply put, https://www.hobo-web.co.uk/ can be treated by Google as a different URL than http://hobo-web.co.uk/ even though it’s the same page, and it can get even more complicated. Its thought REAL Pagerank can be diluted if Google gets confused about your URLs and speaking simply you don’t want this PR diluted (in theory). That’s why many, including myself, redirect non-www to www (or vice versa) if the site is on a Linux/Apache server (in the htaccess file – Options +FollowSymLinks RewriteEngine on RewriteCond %{HTTP_HOST} ^hobo-web.co.uk [NC] RewriteRule ^(.*)$ https://www.hobo-web.co.uk/$1 [L,R=301]Basically, you are redirecting all the Google juice to one canonical version of a URL. This is a MUST HAVE best practice. It keeps it simple when optimising for Google. It should be noted; it’s incredibly important not to mix the two types of www/non-www on site when linking your internal pages! Note Google asks you which domain you prefer to set as your canonical domain in Google Webmaster Tools.
Sign up for our Free SEO training course to find out more. Does The Google Panda Algorithm Penalise Duplicate Content?
Google Panda (a somewhat deprecated SEO term) was the name of a series of major Google search results changes starting back in 2011. The simple answer to the question is…. No – but if you have “copied content” on your site, then you probably will be impacted negatively to various degrees. A part of Google Panda algorithm is focused on thin pages and (many think) the ratio of good-quality content to low-quality content on a site and user feedback to Google as a proxy for satisfaction levels. In the original announcement about Google Panda we were specifically told that the following was a ‘bad’ thing:
If Google is rating your pages on content quality, or lack of it, as we are told, and user satisfaction – on some level – and a lot of your site is duplicate content that provides no positive user satisfaction feedback to Google – then that may be a problem too. Google offers some advice on thin pages (emphasis mine):
Everything I’ve bolded in the last two quotes is essentially about what many SEO have traditionally labeled (incorrectly) as ‘duplicate content’. This might be ‘semantics’, but Google calls that type of duplicate content ‘spam’. Google is, even more, explicit when it tells you how to clean up this ‘violation’:
So beware. Google says there is NO duplicate content penalty, but if Google classifies your “duplicate content” as “copied content”, “thin content” or “boilerplate”, or hastily rewritten or worse “synonymised” or “spun text”, then you MAY WELL have a problem! A serious challenge, if your entire site is built like that. And how Google rates thin pages changes over time, with a quality bar that is always going to rise and that your pages need to keep up with. Especially if rehashing content is what you do. Google Panda does not penalise a site for duplicate content, but it does measure site and content ‘quality’. Google Panda actually DEMOTES a site where it determines an intent to manipulate the algorithms. Google Panda:
TIP – Look out for soft 404 errors in Google Webmaster tools (now called Google Search Console) as examples of pages Google are classing as low-quality, user-unfriendly thin pages. Sign up for our Free SEO training course to find out more. Using Google Search Console To Identify DuplicatesUse Google Search Console to fix duplicate content issues on your site. Note that Google says:
The Excluded report essentially lists:
Tips Google provides in Search Console include an “Everything is OK”:
and two error messages, one being”Duplicate without user-selected canonical:“:
and another, “Duplicate, Google chose different canonical than user”:
Sign up for our Free SEO training course to find out more. How To Check For Duplicate Content On A Website?An easy way to find duplicate content is to use Google search. Just take a piece of text content from your site and put it “in quotes” as a search on Google. Google will tell you how many pages that piece of content it found on pages in its index of the web. The page that ranks for that content is often the original, too. The best known online duplicate content checker tool is Copyscape and I particularly like this little tool too, which check duplicate content ratio between two selections of text. If you find evidence of plagiarism, you can file a DMCA or contact Google, but I haven’t ever bothered with that, and many folks have republished my articles over the years. I once found my article (word for word) in a paid advert in a printed magazine before, for someone else! Comments: A few marketers and Google spokespeople have commented on this article on social circles (presented as images throughout this article) More reading |