In what is one of the more interesting and debated SEO topics, it turns out, maybe not surprisingly, that as with anything, Google apply judgement and don’t automatically penalize duplicate content whether it’s within your website pages or content across the web. John Mueller, the lead of the Search Relations team at Google, recently confirmed that the search engine algorithms don’t negatively score repeated content across pages.
The exact quote:
“With that kind of duplicate content it’s not so much that there’s a negative score associated with it. It’s more that, if we find exactly the same information on multiple pages on the web, and someone searches specifically for that piece of information, then we’ll try to find the best matching page.
So if you have the same content on multiple pages then we won’t show all of these pages. We’ll try to pick one of them and show that. So it’s not that there’s any negative signal associated with that. In a lot of cases that’s kind of normal that you have some amount of shared content across some of the pages.”
When is duplicate content OK (hint: it’s not about originality it’s about quality and context)
The rule of thumb: remember what google are trying to figure out - it’s what are people searching for and what’s the best page to answer that.
So, for things such as boilerplate content (short standardized paragraphs such as “about us” statements you may end up adding to many blog posts, PRs and pages), product descriptions, website footers, landing pages, and others that feature “substantive blocks of content within or across domains that either completely match other content or are appreciably similar” (right from the horse’s mouth), it’s actually normal to have some amount of duplicate content. The good news is that Google accounts for this and handles it without deliberately hurting your ranking score.
In other words, your about us boilerplates a gazillion times across the web is not an issue. Just as your legal texts across the website isn’t. It’s not causing confusion, people aren’t searching for that. And if they are, google decide which page it makes the most sense to show.
The bad news is that this is just one factor of the larger and broader search experience, meaning you don’t get a free pass here - you need to always consider the search intent and how what you’re publishing is in the best service of that.
I’m going meta on duplicate content here with an example of how judgment is applied even when google potentially show results that trump their own content at the google blog for the sake of quality. The context (and meta part) quotes:
“And sometimes the person who wrote it first is not the one for example that is the most relevant.
So we see this a lot of times for example with our own blog posts where we will write a blog post and we’ll put the information we want to share on our blog post and someone will copy that content and they will add a lot of extra information around it.
It’s like, here’s what Google really wants to tell you and it’s like reading between the lines and the secret to Google’s algorithms.
And when someone is searching it’s like maybe they want to find the original source. Maybe they want to find this more elaborate… exploration of the content itself.
So, just because something is original doesn’t mean that it’s the one that is the most relevant when someone is looking for that information.”
Spoiling the party a big - things to consider
At the end of the day, you won’t be able to decide which version is the most relevant to a specific search query as Google does that for you. In turn, links, content metrics, and other ranking signals that search engines apply won’t be credited to the desired URL.
Also, pages with multiple versions of the same content suffer from low visibility, which can be further thinned out with other sites that have trouble distinguishing the best piece of content.
Make sure to not try to be smart and block access to duplicate content, you’ll only be doing yourself a disservice. If Google can’t crawl all the pages with duplicate content, it can’t consolidate all of its ranking signals so it starts treating those pages as separate and unique. Use the rel="canonical" link element to mark the specific URLs as duplicates so they’re treated as copies.
Word of warning: don’t abuse this. Google still penalizes content that is deliberately duplicated in an attempt to manipulate users and as such, rankings. The worst that can happen here is the site being entirely removed from the search engine and not appearing in search results any longer.
What about duplicating content to syndication platforms such as Medium and Business2Community?
The known dilemma is that this provides good reach but since they don’t use a canonical, it’s scary to get penalized in the worst case, and in the better scenario to get cannibalized for the same keywords your original article is targeting. Google states:
"Syndicate carefully: If you syndicate your content on other sites, Google will always show the version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer. However, it is helpful to ensure that each site on which your content is syndicated includes a link back to your original article."
We’ve reached out to the Business2Community team, here’s what they had to say about this:
"Regarding managing syndicated content, Google provides a variety of options for site owners to consider and each option has its own pros and cons. These options include:
- No-indexing the content entirely
- Providing a link within the body of the article back to the original source
- Using a canonical tag
Based on our research and years of industry experience, we have chosen to provide a link within the body of the post back to the original source. At the bottom of every piece of syndicated content on our site, you will see the following:
“This article originally appeared on (insert name of blog/hyperlink) and has been republished with permission.”"
This is fairly in line with the overall premise we’re discussing here. My advice - Keep your high level goals in mind. if your brand doesn’t yet generate massive reach, use those platforms to get under the radar of more people. Google won’t penalize you for it and showing up a rank or two under medium or business2community under certain keywords, while getting credited for it may not be a bad thing in the overall scheme of things.
At Bold we go for a median tactic - we syndicate some of our clients content to those platforms.
Technically navigating duplicate content
Apart from rel=canonical labeling, here are best practices to indicate your preferred page to Google:
- Use 301 redirects to refer a duplicate page to the original one and avoid them competing with each other.
- Set the preferred domain and/or parameter handling to indicate to Googlebot how to crawl and treat different URL parameters. This can be set via Google Search Console.
- Be consistent with your internal linking. The absence ‘www’ or ‘/’ can make a difference as http://www.example.com/page/, http://www.example.com/page, and http://example.com/page/ are three different pages in this case.
- Get to know how and where your CMS displays content so you reduce repetitions in different formats as well (e.g. previews).
- Add a bit of voice and personality with your wording to create unique content.
Summing things up
As time goes by and Google evolves, the focus on user experience and intent extends. Don’t put any decision on autopilot, and don’t cling to technical rules of thumb. The high level big picture should guide you. Things aren't binary. Duplicate content isn’t entirely wrong, the context and intent are what matter.