User metric discrepancies between July 12, 2019 and April 30, 2020

Hello! We’re excited to launch our new Impact dashboard on PubPub. However, it introduces a discrepancy between user data collected between July 12, 2019 and April 30, 2020. This post explains the discrepancy and gives some advice on how to make rough direct comparisons between the two time periods.

The User metric is defined as the number of unique visits by a single visitor within the queried time period. In other words: if the same person visits 3 times in one day, that will count as 3 pageviews but only one User.

Across the analytics industry, the User metric is imprecise because it typically relies on a browser cookie to identify unique users. So, for example, if a user visits a page once on their mobile phone browser in the morning, and a second time on their desktop browser in the afternoon, that would count as two Users, even though it is the same person visiting.

GDPR and similar privacy laws further complicate the issue by barring some versions of this type of tracking by third parties unless a user consents to it. That’s what caused the data discrepancy for PubPub.

On July 19, 2019, we implemented a GDPR-compliant opt-in feature for our analytics tracking. Because of the way our analytics provider at the time handled opt-ins, this resulted in counting every single visitor who did not opt into tracking as one user, causing a large drop in our user counts. On April 30, 2020, we switched to a new provider that does not have this limitation and is able to count the visits of users who do not opt into tracking while still remaining GDPR compliant, which gives us a far more accurate picture of the number of users visiting your site.

We still, of course, follow GDPR and similar regulations for users who choose not to opt into our analytics tracking. No data traceable to the user is stored for users who do not opt into tracking, and because we own our analytics stack and don’t rely on third-party tools like Google Analytics that resell and share data with ad platforms, there is no third-party processing of any user data at all. In fact, we take an extra step and, when a user opts out of tracking, we generate a new cookie for that user, effectively breaking the chain of data that would allow us to tie multiple visits together so we have no way of knowing that the user ever opted in.

For Communities looking to gauge the impact of their work, our absolute commitment to lead to a tradeoff, in this case this discrepancy.

To help Communities make sense of the difference in User counts, we’ve provided a “Benchmarking” dashboard on Communities and Pubs that existed before April 30, 2020 that shows the differences in counting between the two data sets during an overlap period when both the old and new provider were running from April 30, 2020 to June 21, 2020. Using this benchmarking dashboard, you can effectively understand the average difference in User counts between the two sets of data, and, if needed for reporting, multiply past User counts by that average difference to get a rough sense of Users between the two periods that can be compared. Because User is already an imprecise metric that should be thought of as a directional indicator, rather than an absolute number, we believe this comparison should suffice for most reporting purpose.

Going forward, we do not expect further inconsistencies, and we have a lot more control over the data we collect and how we display it than we did previously. If you have any questions, or ideas on how to expand the new dashboards, please feel free to reply to this thread or write to help@pubpub.org.