Google Analytics is a well-liked service for monitoring and analysing site visitors to a web site. By a brief code positioned within the supply of a web site, a consumer can monitor the efficiency of all their on-line properties. These monitoring codes can even clearly point out when a number of web sites are run by a single consumer or entity — which means they’ve been a very helpful breadcrumb for open supply researchers.
However there’s a catch: Google is phasing out these codes and changing them with ones that include much less knowledge and make it tougher to trace who controls websites.
To deal with this drawback, Bellingcat has developed a light-weight open supply analysis instrument—Wayback Google Analytics — which automates the gathering of monitoring codes and discovery of relationships between web sites utilizing copies of websites maintained by The Web Archive’s Wayback Machine. It will assist researchers sidestep current adjustments to how Google manages its analytics knowledge.
What’s a Google Analytics Code?
Google makes use of a sequence of distinctive monitoring codes to assemble analytics knowledge on web sites. For over a decade, the most well-liked Google monitoring code was the Common Analytics (UA) ID: a small tracker buried in a script tag in a webpage’s supply code.
A UA code seems like this:
UA-123456789-1
There’s a whole lot of helpful data right here. The centre code is a novel monitoring ID issued to a number of web sites managed by the identical consumer or entity. The trailing digit separates a number of on-line properties owned by that entity (e.g., UA-123456789-1, UA-123456789-2, and so on.).
Monitoring UA codes is a staple technique within the OSINT toolkit that’s repeatedly utilized by investigators. In 2017, journalists in South Africa used Google Analytics knowledge to uncover a coordinated disinformation marketing campaign funded and managed by a member of the infamous billionaire Gupta household. In 2015, Bellingcat contributor Lawrence Alexander used this similar methodology to join dozens of internet sites pushing pro-Kremlin narratives about Syria and Ukraine to a single particular person based mostly in St. Petersburg, Russia. In each circumstances, shared UA codes between a number of internet pages have been a key knowledge level within the investigation.
Nonetheless, conducting such an investigation in 2023 is far more tough as a consequence of sweeping adjustments in how Google manages its monitoring IDs.
Google Analytics 4
Earlier this 12 months, Google rolled out Google Analytics 4 — a brand new analytics framework that replaces UA codes with much less uniform monitoring IDs which can be considerably harder to glean data from. It’s not attainable to acquire a brand new UA code, and most main web sites have up to date their monitoring IDs to the brand new G and GTM codes.
That is unhealthy information for investigators, as ProPublica reporter Craig Silverman defined earlier this 12 months. Whereas GTM and G codes are nonetheless worthwhile breadcrumbs, fewer on-line companies preserve databases of those trackers. Furthermore, gone is the helpful suffix that helps point out when a number of websites are utilizing the identical monitoring code.
There’s some excellent news, although: Google says it isn’t planning to pressure web sites to take away current UA codes and, with time, extra companies will possible start to catalogue G and GTM codes to assist discover relationships between web sites.
Within the meantime, we will nonetheless extract legacy UA codes from web sites that proceed to make use of them. Plus, we will additionally use the Wayback Machine to look at the supply code of internet sites previously and discover any overlapping UA codes.
Utilizing Wayback Google Analytics
Bellingcat’s Wayback Google Analytics automates the gathering of analytics codes and checking their utilization on a number of web sites. We may give the instrument a listing of internet sites, a time vary and a desired output format (.csv, .json, and so on) so we will shortly get a fowl’s eye view of any shared knowledge between web sites.
You may learn a extra in-depth information on utilization and set up on our Github web page, however let’s study what a typical use case would possibly appear to be. We’ll use a couple of (now-defunct) Russia-linked disinformation web sites coated in an earlier Bellingcat investigation to exhibit the instrument’s usefulness.
- https://yapatriot.ru
- https://zanogu.com
- https://whoswho.com.ua
- https://adamants.ru
We’ll assume that we solely need knowledge from 2015 till the current. Since we’re searching for relationships between the web sites, we’ll output the information into an Excel spreadsheet. Our command seems like this:
wayback-google-analytics -u https://yapatriot.ru https://zanogu.com https://whoswho.com.ua https://adamants.ru -s 01/01/2015 -f yearly -o xlsx |
(There are a couple of different parameters included on this command — try our README for a full checklist of choices)
Wayback Google Analytics then generates an Excel spreadsheet that studies the analytics codes that have been utilized by every web site, in addition to once they have been first and final encountered in our search. On this case, these web sites are not energetic, so we solely have archived knowledge obtained from the Wayback Machine.
We will additionally view the information by code:
Within the above knowledge, “whoswho.com.ua”, “yapatriot.ru” and “zanogu.com” every share the identical base code— indicating that they could be operated by the identical entity.
Wayback Google Analytics is an open supply undertaking anybody can contribute to. Go to Bellingcat’s Github web page to view contribution tips or have a look at this undertaking’s energetic points to get began.
Bellingcat is a non-profit and the power to hold out our work relies on the sort assist of particular person donors. If you need to assist our work, you are able to do so right here. You too can subscribe to our Patreon channel right here. Subscribe to our E-newsletter and observe us on Instagram right here, X right here and Mastodon right here.