What Is Referrer, Ghost and Crawler Spam? And How Do I Filter Them In Google Analytics?
You can beat referral spam via filters in Google Analytics – but you will also have to read on to find out how to set them up.
If you’re reading this article due to a recent spike in GA spam relating to Reddit, Donald Trump or random language settings, this article will give you an insight into what this type of spam is, but for a solution on how to block it be sure to read this post.
So what is referrer spam?
Over the course of 2015, we have seen a rise in “Referrer Spam” being tracked within Google Analytics. This has come in the form of referral visits to a site, which artificially boots the number of sessions which in turn can skew your engagement and conversion metrics.
On larger, higher traffic sites, this isn’t so much an issue, but on smaller sites a few hundred visits a month can seriously skew your data.
If you analyse your visitor data over the last 12 months you’re likely to see visits from site’s such as semalt.com, free-share-buttons.com, Get-Free-Traffic-Now.com and similar.
You’ll notice these tend to generate a 0:00 time spent on site, only “view” 1 page and a bounce rate of 100%. Others will just have really dodgy looking names:
There are 2 types of referral spam – ghost and crawler.
What is Ghost Spam?
Ghost Spam will inflate your visitor data without ever actually visiting your site. This is possible by executing your Google Analytics tracking code – which will have been scraped from your site at some point in the past – from any other 3rd party domain, this is then sent to the Google Analytics server directly and tracked.
These can come in the form of referrals like in the image above, organic traffic in the form of very random keyword phrases such as “beat with a shovel the weak google spots” and event tracking that reads “to use this feature visit: EVENT-TRACKING.COM”
The following image should hopefully give you a clear idea of what is going on here:
What is Crawler Spam?
Crawler spam actually does visit your site. A crawler is a “bot”, similar to those from the likes of Google and Bing who crawl your site’s pages with the purposes of indexing them. You then end up with seemingly a number of visitors from 3rd party domains who interact with your site, spending a decent amount of time on it and visiting a number of pages.
Why is my site being targeted?
Spam is spam, don’t take it personally. They are just attention seeking, so just ignore them (after you’ve filtered them out, more on this to follow).
Their hope is that people will see their domain in their analytics, presume they are a valid site and go visit them.
When you get there, you might be pushed an SEO service of some sort, be redirected to a genuine site as part of an affiliate program or end up with a virus of some sort.
How Do I Stop This?
The first step within Google Analytics is to visit the Basic Settings section, and tick “Exclude all hits from known bots and spiders”. Why this isn’t ticked by default, I don’t know, but it’s an easy enough step to take, and rest assured Google will be doing all they can behind the scenes – and it would probably by a lot worse than it is, if they weren’t.
However, until they come up with a bullet proof, more effective solution, there are other steps you will need to take.
So How Do I Stop Ghost Spam?
There is a fairly straightforward solution that involves setting up 1 filter within Google Analytics.
Every real referral visitor to your site will have two properties – 1 is the “source” that the linking URL is on, the other is the “hostname” that the landing page is on, and this is more often than not going to be your server.
Because Ghost Spam is random in its nature (it’s not specifically targeting you, it’s all random and dynamic), when they send Google their duff data, they won’t know your hostname, so instead randomly make them up:
As these never actually visit your site, you can’t block them via htaccess or at server level, and so you need to set up a filter to exclude anyone other than your own domain and any 3rd party sites you know that your tracking code is set up on – such as 3rd party payment gateways or call tracking sites.
You should review your analytics to identify your valid hostnames then add these into a custom filter.
This is done by going to Google Analytics admin and navigating to “Filters”. Here:
- Create a new filter
- Give it a name
- Select filter type “Custom”
- Select “include”
- Set “Filter Field” to “Hostname”
- Set “Filter Pattern” to match your list.
As filters are permanent and will block traffic from your data moving forward – we would always recommend testing the above first by setting up an advanced segment which does the same and check the referral data output:
First set one up to include your preferred domains:
And check your referral data:
Then set one up to exclude your preferred domains:
And you should see a list of the type of sites that have been giving you analytics nightmares up till this point:
Once you are happy with your hostname inclusion filter, click to save.
So How Do I Stop Crawler/Referrer Spam?
As these spammers actually visit your site, the hostname solution above is not enough. To block these, you need to be specific by including their domain name in a filter.
There are, as we know, a lot of domains, and this list keeps expanding.
My initial intention was to compile all the domains we have blocked over 2015 and use this to dynamically generate spam filters for you to apply to your Google Analytics account.
Whilst this would be useful, it would also be time consuming and there’d be an overhead in keeping this up to date.
Then I found someone had beaten me to it, so have opted to join them:
If you visit the above link when logged into your analytics account, click on initialise you will then be able to select the accounts you wish to add the filter to, then click “Create And Apply Filters”:
If the above link has reached its API Quota, the following also utilises the same code:
The latter link is also useful, because as filters only block out spam traffic from your data moving forward, it won’t remove it retrospectively, so in order to do this you’d need to set up some custom advanced segments.
And Then What?
The above steps should see Ghost and Referrer Spam reduced if not eliminated from your reports moving forward.
It’s worth revisiting the data from time to time to ensure no new domains have popped up, and if so, add them to your spam filters.
The war against spam never ends, but hopefully methods such as the above should make it a more pleasurable experience.
If you have any questions or need any assistance implementing any of the above, let me know in the comments below.
I’ve read through a lot of articles whilst finding out more about ghost and referral spam, here are some of the best articles: