19 Feb 2017, 17:47

Now Entering an Analytics-Free Zone

Telling people you are starting a blog is one of those things that typically garners a reaction. Mostly, it involves questions: what will you discuss? How often will you post? Are you going to quit before you start (poignant!)?

Occasionally, however, you’ll also get advice on what to discuss, how often to post, and many other things. I’m always grateful to learn from people who have blogged more than me, so naturally I was excited to hear about others' blogging experiences. Some people even showed me their blogs.

At least two friends, whom I would consider non-technical, offered to show me their Wordpress setup, complete with statistics about which users from what countries had visited their site, what pages had been the most popular, and so forth. I believe this functionality is offered by WP out of the box as an analytics package.

I’m not sure if they knew, but they also had Google Analytics (GA) set up on their sites. I wasn’t sure if GA was powering Wordpress or was an independent thing. In any event, data was being collected via the ever-present web analytics on the sites.

A brief primer on web analytics

Web analytics essentially involve tracking data generated by interacting with a web site. Much of this data is associated with a user’s behavior - how they got to the site, how much time they spent on a page, whether or not the next page they went to was on the same domain, how long pages are loading on average, what city and country visitors are coming from, and other related measurements. Wordpress, Google Analytics and others provide useful aggregations of this data and expose this data to the owner of the particular blog or web site.

Generally speaking, this data can be very useful for improving the website. There’s certainly utility in knowing, for example, the most popular referrers (perhaps there’s strategy to develop there when it comes to ad placement or identifying an audience), the most popular articles, the percentage of your audience who can afford to spend money on what you’re selling, and so forth. I believe there are even some social media components: for example, number of users coming from the ‘book and Twitter.

Of course, if all the numbers read 0, you’ve got some work to do. But don’t fret about that too much.

I can see how this data is both useful and fun to look at, for non-technical people as well as those of us who work with and make decisions based off data on a daily basis. There’s a certain positive reaction, according to my friend, when a new visitor from Bolivia stumbles upon one of his articles about India (I didn’t have the heart to explain that it may just be a bunch of bots).

I imagine the insights are even more relevant if your blog is at a point where significant traffic is driving referrals, products, or ad revenue - in other words, the money you get to take home. Pretty cool, huh?

It’s just a numbers game

But the data must come from somewhere, and in this case, it’s almost certainly from users unaware of, or unable to stop, the tracking. There’s no information, opt-out process, or any other indication that the users’ mouse clicks, location, and next site visited are being tracked.

Many users that are not tech savvy probably can’t even identify what web analytics are capable of, much less identify sites (hint: virtually all) that use them. It’s not possible to know if you’re on a Google Analytics site until it’s too late - when GA has already loaded and presumably sent data back - without your consent. And, as far as I can tell, Google Analytics does not respect do-not-track, which is already a somewhat technical feature.

More advanced users can protect themselves to some extent. Or, if you truly trust Google, you can install some plugin that allegedly tells Google Analytics not to send data back when visiting sites with GA. I won’t link to this as I don’t want to appear to encourage users to download this archive without seeing its contents, and I’m certainly not volunteering to scope it.

Piwik, the savior?

Before you think I’m putting Google on blast unfairly, I won’t spare other analytics software either. While I appreciate some of their sales points - owning the data, “user privacy protection”, F/OSS - at the end of the day, it is exposing collection of user data that, generally, the users aren’t aware of. Lastly, it involves standing up MySQL (in 2017?) and PHP (?!) thus requiring a running server, not to mention exposing a plethora of potential security issues. The average web site owner has absolutely no chance of fulfilling the advertised user privacy protection when faced with administering a PHP-MySQL stack on their own. Postgres is not even supported, and given that issue has been open for 8 years now, probably won’t be.

Back to user space

Needless to say, the user gets the raw end of this deal. At minimum, they’re blissfully unaware of the massive amounts of data they leave behind when visiting web sites, how they are tracked leaving and entering other sites, and so forth.

I will focus on these issues in more (and less, for those who don’t work in tech) detail in upcoming articles, but for now, it’s sufficient to say that it’s completely unacceptable to do this to users here. As someone interested primarily in helping users, as well as maintaining and protecting privacy, this sort of hidden data collection is not very attractive.

A matter of principles

I won’t be adding or using analytics to this blog, no matter how useful they may be to developing it. To be honest, it’s a principled (and therefore simple) decision: as someone whose intent is to help users as well as someone who values privacy and has put in effort into not being tracked online, I can’t possibly consider using privacy-compromising technology here.

At first, I thought this might be somewhat of a Stallman-esq stance: this blog would be the only site, or one of a few, that chose not to embed analytics or social media trackers buttons. Many would claim: what’s the point? Join in on the massive vacuuming of your users' data for your benefit (as well as the benefit of a large advertising company)!

To my delight, this had been done by a couple of other blogs I had seen recently. Mike’s blog, from which I essentially taxed the entire setup of my own blog, has chosen to omit both GA as well as Facebook and Twitter buttons on his site. He also went to the trouble of removing GA from the otherwise excellent purehugo theme, contributions which I gratefully used myself. He justifies not using analytics with a privacy argument. Another blog takes a different but also principled approach, reducing cruft and inefficiency and discussion about how it these things reflect the state of our society in general.

I personally appreciate both positions, but the privacy one rings closer to home.

Core purposes as guidance

One theme I want to be a part of this blog is the idea of helping others navigate a world in which data they emit is frequently collected, sold, bought, stolen, and battled over in courts, both normal and secret (among other things).

While I have some goals for the blog, I can’t share the specifics due to psychological reasons. None of the goals, however, involve monetizing users. As a result, web analytics won’t be a part of the experience around here.

With that said, enjoy a non-intrusive (and hopefully speedy) experience on this blog, now and in to the future.