The Bot Traffic Deception: How Bots Sabotage Your Web Analytics and Decision-Making

Bot Traffic Impact on Google Analytics

Modern marketers take every opportunity to make data-backed decisions on where to put their limited resources. Since those decisions are generally influenced by measures of quantity and quality, it is time to address the serious bot problem that distorts your data, and consequently, your decisions.

After an investigation into our clients’ (and our own) data, we discovered a truly disturbing fact – up to 68% of all traffic in GA4 were bots – meaning marketers have been making decisions based on flawed data, and in many cases, drawing flawed conclusions.

Story Time

Bots are nothing new, they have been an issue in Google Analytics for years. In the 2010s, bots would cause ugly spikes in the data, which was easily identifiable by seeing 10x more traffic than the usual volume in one day. 

Back then, this was as easy to fix as filtering out network domains of the bot traffic. Now that Google has removed this feature, you are left to fall back on a feature that many analysts have little faith in – Automatic Bot and Spider Filtering.

Even with the “fix,” we knew our data was off by a few percentage points as we saw signs some bots were getting through. We saw the biggest impact was when we launched or modified media, but we assumed the distortion was relatively consistent and the conclusions we drew from the data were still sound.

We stumbled upon the sudden jump in bot interference by accident when one of our team members made a simple mistake, leading to us needing to estimate some Google Analytics traffic for reporting. We’ve had to make this kind of fix before, only this time, the math just didn’t make sense. 

Based on performance, the math told us to multiply the clicks in-platform by 243. That was just plain nonsensical, since historically, the multiplier had been between 0.5 and 1.5. We checked. And checked again. We pulled the data correctly. The math was right. The data itself was bad.

We started digging, looking for patterns in Google Analytics to see what might be causing the anomaly and noticed that a disproportionate volume of the traffic was from the same select cities – cities known for large data centers (but not large populations of people) – Columbus, OH; Ashburn, VA; Hampton, AK. 

The bounce rates were high. Time on site was low. This was clearly bot traffic.

So our team set out to find ways to separate bots from humans inside of website traffic. We studied the networks the traffic came from, characteristics of the browsers the bots used, data sent with each network request, and more to come up with four distinct methods of identifying the bot traffic. 

We picked eight clients (including ourselves) and added scripts to push these four detection techniques into Google Tag Manager. We then configured custom dimensions in Google Analytics 4 and pushed the results from Google Tag Manager into Google Analytics. 

Then we waited. 

After three weeks, we crunched the numbers. They were ugly.

Results

Only two of the clients analyzed came out with less than 25% bot traffic. One had a whopping 68% of all traffic in Google Analytics coming through as suspected bot traffic!

Percentage of Traffic Identified as a Bot by Client

We were stunned. The data we had been reporting to our clients was wrong. It was horribly distorted, and based on the variation among clients, not in a consistent way. 

But it only got worse as we dug deeper into the data…

We found that the distribution of bot traffic is very uneven, depending on the source of that traffic. For example, 72% of the traffic generated by programmatic paid media was bots, whereas paid search was only 2%.

Percentage of Traffic Identified as a Bot by Source

This means that when comparing the efficiency of programmatic display vs. paid search, programmatic display appears much more efficient than paid search at driving traffic, but in reality, they are close to the same.

The story gets much more complicated, however, when we look at traffic quality. For that analysis, we use engagement rate, or the number of engaged sessions divided by the number of total sessions.

When we take bots out of the equation, our engagement rates are much more consistent. 

Still, at first glance, the changes can be somewhat enigmatic.  Email and Programmatic see boosts in engagement rates when we remove bots. 

This is likely due to the fact that the bots visiting from email and programmatic media (antivirus and quality assurance, respectively), don’t do anything after loading the initial page and scanning it. 

However, Facebook engagement rate actually drops when we remove bots:

A deeper analysis reveals that this is because the majority of Facebook bot traffic is Facebook’s own quality assurance bot. That bot will generally stay on the landing page for 14 seconds and often scroll, presumably in an attempt to trigger popups or overlays that might negatively affect the user experience. 

Bottom line, these bots are having significant impacts in the measurement of both quantity and quality of website engagement. Furthermore, these impacts are not evenly distributed across traffic sources, causing major distortions when comparing one traffic source to another.

Impact

As data-driven marketers, our reputations are staked on the data we report and the insights we garner.  When bots interfere with that data, we risk our reputations by reporting bad data. 

Additionally, if we know the data is dirty, and we continue to report it, we run the risk that some day clean data arrives, and we have to eat crow with our stakeholders. The people to whom we reported months, maybe years of dirty data. 

Finally, bots eat at our confidence. When asked, “Why did this number change?” and seeing no clear cause, we are in a position of saying, “we don’t know” or making something up. Either way, we damage confidence in our abilities as marketers and risk others’ confidence in us. 

Solution

So now that you know about the havoc bots wreak in your data, what can you do about them?

We’ve developed a tool called Bot Badger you can install in front of your Google Tag Manager (GTM). Bot Badger will either pass a flag telling GTM a visitor is a bot, or if you’d rather, just not load GTM at all for bots.

This keeps not only your Google Analytics data clean, but will also keep bot data out of your A/B testing tool, marketing automation, screen recordings, ad and social media pixels, and any other tags you have installed.

You can start your free trial today so you can see for yourself the impact bots have had on your data. Let us know what you learn. We’d love more data points as we work to let the world know the extent of damage bots do to the decision-making of marketers like you.

Get The Most From Us

Don’t miss a post! Sharing knowledge is part of what makes us special, and we take it seriously. Sign up below to continue to grow and walk up the marketing maturity curve!