Rants

A Better Cookie Solution

Why the Proposed Do-Not-Track List Can't Have the Desired Result

November 4, 2007

Last Monday, AdvertsingAge reported on a proposed 'do-not-track list'. Though I very much understand the privacy argument, this is definitely attacking the wrong end of the problem. Not only will a do-not-track list force private and identifiable information into the hands of advertisers, it will remove the anonymity that the current system has in place.

The Current System

Currently, when you visit a web site, it may set one or more cookies on your computer. These cookies are small text files that are used to track a visitor's movements on a site and to recognize a returning web visitor. This allows servers to recognize returning site visitors and automatically log them in or configure the site to their personal settings. This controls everything from how your Google home page looks to Amazon remembering what was in your shopping cart yesterday. This also allows site owners to have a clearer understanding of which links on a page are popular, and which sections of the site visitors relate to each other. It is very clear that cookies have benefits for both site visitors and site owners.

However, currently, there is no real identifying information transferred. he cookie is a text file that is stored by the web browser on the visitor's computer, which is only returned to the server that set the cookie in the first place. Though the cookie could (and sometimes does) contain identifiable information, such as a username, name, or password, this is a bad practice (and generally in poor taste). What is transferred s an identifier that the server can use to identify the visitor. All the cookie does is differentiate between visitor 1 and visitor 100. The information about visitor 1 (or 100) has to be stored on the server to be of any use to either the visitor or the site owner.

The Problem

The problem is when sites other than the visited site track information by setting cookies. This happens when ads are served by advertising networks, and a connection is made to a third party server which can then set a cookie that allows that third part to subsequently track visitors across different domains and sites. This information can then be aggregated. Still, there is no identifiable information about the visitor that is collected, just that a visitor that visited this site subsequently visited this other site.

This leads to third party advertising networks being able to know what sites a visitor has been to and how recently, and based on this, publish advertising for sites similar to those which the visitor has previously visited.

The complaint is that targeting advertising this way is an invasion of privacy.

The counter argument is that targeting advertising this way produces more relevant ads, which in turn are viewed more positively by the consumer.

Of course, there is a huge flaw in this counter argument: a misunderstanding of relevance. Though relevance to the consumer based on past trends can be helpful in understanding consumer behavior, relevance to current content is much more important. i may have been looking at computer hardware last week, but if I'm visiting aquarium web sites today, fish related ads are relevant and computer component ads are not. The advertisers and advertising networks are off-base.

The Do-Not-Track Problem

The way these tracking cookies currently work, unless a site is very poorly designed, there is no identifying information tracked with the cookie. When I visit a new site and that site sets a cookie, the site does not know who I am, just that I am visitor xa234bd34fggQ/R. They can relate this to which page I entered the site from and which I left from, what pages I visited in between, and what page referred me to their site in the first place. The site owner has no idea who I am, my age, where I'm from, etc (unless I give him information while I am on the site).

However, for a do-not-track list as proposed to work, when I visit a site, I would have t provide that site with enough information to recognize me and match me to a contact on the proposed do-not-track list. I would have to provide the site with personally identifiable information in order to for the site to determine whether or not they can recognize me in a non-identifiable way! This is a huge step backwards for any privacy concerns.

A Better Solution

There's a much better way to address this problem - through the browsers.

First, I think the browsers should be tougher on cookie origin restrictions. By definition, cookies are only supposed to be allowed from the visited site. In other words, when you visit weif.net, weif.net could set a cookie, but weif.net cannot set a cookie for amazon.com, nor could amazon.com set a cookie itself when people visit weif.net.

However, there's a loophole. If an image, object, or script is hosted on one server and presented on another, most browsers will allow the server hosting the object to set a cookie as well. This is how advertising networks track which sites people are visiting.

I'd propose that only the domain in the address of the requested page - the one that's in the location bar in the browser - be allowed to set cookies. This would eliminate most of the privacy issue that the do-not-track list is supposed to address.

However, I think the browsers should be encouraged to take a second action to help visitors understand when they are being tracked, how, and by who.

I would suggest that first the default cookie handling on any browser be changed to prompt the visitor when a cookie is gong to be set, then the elimination of the option to always accept cookies. When a visitor is prompted to accept a cookie, they should have these options:

  1. Accept this cookie
  2. Accept this cookie for this session
  3. Reject this cookie
  4. Reject all cookies from this server
    or
  5. Reject all cookies with this name from any server

This should give the visitor the discretion necessary without making the process overly complicated.

I know that some web developers will complain that this will make cookie management too much of a burden, and will confuse visitors, discourage internet use, and drive visitors away from the best sites (as most of those sites make excessive use of cookies).

Although I believe those concerns are currently justified, I believe that they are invalid. There is no reason that being prompted to accept cookies should be detrimental to an individual's use of the internet as a whole, or any site in particular. This presumption is based on current poor development of web sites and carelessness by web developers and web application developers.

There seems to be an unjustified tendency to just set another cookie' without any rhyme or reason every time the whim arises. This leads to sites where 2, 4, 8, 32, or even more cookies are set. This is no only unnecessary, but unjustifiable.

Lets take a look again at what a cookie does: a cookie allows a server to recognize a returning or continuing visitor with a unique string.

Once the visitor is identified, he is recognized by the server so the server can connect that visitor with his or her profile or history. In order to be of any use to the site owner, this information needs to be stored on te server. If it is stored in a dozen or a hundred cookies on the visitor's machine, the information is of no use to the site owner at all, and potentially problematic to the visitor as well.

One Cookie Example

So, lets see what happens when a visitor comes to a site.

The site sets a 'visitor' cookie so that it can recognize this visitor if he or she returns.

The visitor clicks around a bit, and then leaves.

The site set one cookie on the visitor's machine, the visitor's history can be reported to the visitor next time he or she returns, and can be used by the site administrator to see how people navigate the site.

If the visitor returns a week later, the server sees that the visitor has a cookie, and the server can associate this visitor with his or her previous actions. Maybe the visitor added something to a shopping cart, the site can report to the visitor that there is one item in the shopping cart. The site can also report to the visitor which three pages he or she last visited on the site, so the visitor can resume browsing the site from where he or she was. There hasn't been any reason for a second cookie.

Another Example

Lets look at this another way.

Visitor 1 comes to the site, and we set a 'visitor' cookie. We now know that this visitor has been here. The visitor clicks a link, so we change the visitor cookie —

wait — why change the cookie. Changing the cookie means that we risk losing lots of information on this visitor. Instead of changing this cookie, lets just update the record on the server for visitor 1. By updating the information on the server, we have reduced the risk of data corruption by making fewer changes on our server, and we have reduced possible visitor irritation by not having to (maybe) alert the visitor to the change. We have also avoided risk of data loss by keeping the information on our server, not on the end user's machine where it is likely to get deleted or altered without the site owner's knowledge.

Ok, so we avoided several issues by not changing the cookie on each page load. Lets move on.

The visitor wants to access a privileged use area of the site — a members only area which requires login. The visitor fills out a login form, asking for username and password. On submit, if the credentials map out correctly, we'll set a 'login' cookie —

wait — again, we have issues of prompting the visitor (maybe), getting the cookie deleted, etc. In this case, we've also already been tracking the visitor. We know who they are. Their existing cookie can now identify them as being logged in. We don't need an additional cookie. If we need to handle expiration of 'logins' differently than 'returning visits', we can handle that in our local information on the server. There isn't a need to set a second cookie, other than to possibly frustrate the visitor and open the possibility to loss of data — which could be bad for both the site owners and the site visitor.

OK, so we again avoided several issues by not setting a second cookie when one wasn't needed.

In short, I have been unable to come up with a case for a second cookie. Every time, I come back to this: the visitor is already identified, you don't need to identify them again.

The Basis of the Opposition to This Solution Is Unfounded

Setting multiple cookies or repeatedly changing cookie values consistently generates unnecessary risk as well as potential bad customer experiences. The argument against prompting visitors to accept or deny cookies or changes to their cookies is based on fundamentally poor design. If the design problems are cleaned up, then prompting visitors to set cookies is fine. The bad experience that getting prompted for cookies all the time creates is not due to the prompting, but to bad cookie practices on many web sites.

Additional Benefits

In addition to forcing some clean up on poorly designed web sites (which, as noted above benefit both visitors and site owners), prompting visitors to accept cookies helps to raise internet users' awareness of how and when they are being tracked, and gives them the opportunity to allow or deny that tracking. It also puts the onus on site owners and designers to demonstrate that they are trustworthy before a visitor will allow themselves to be tracked. Anything that generates transparency and honesty will improve the internet.

By bringing any tracking to the attention of the visitor, another possibility for theft of personal information can be alleviated. If the browser has reminded a visitor that he or she is being tracked, the visitor is more likely to try to figure out how to get 'un-tracked' if he or she is using a public or shared computer. This, of course, reduces the risk that a visitor would not clear cookies on a public computer, allowing another user to get access to information by returning to a recently visited site.

And, for those who do find getting prompted to accept cookies all the time is too bothersome or too tiring, there is alway is the option to reject cookies. Though this may mean that these users would have to log in each time they wanted to add something to a shopping cart on some sites, or log in again to make a second post on their favorite forum, the difficulty should be rather minimal. In fact, with a little work, the site developers should have little problem allowing users to track through their site as a logged in user without a cookie. The reason so many sites now don't bother is purely developer laziness, and this laziness is already resulting in significant bad customer experiences, which could quite simply be fixed with a little testing and concern.

So, Anyway...

By making a couple adjustments to web browsers — blocking all cookies set by a server other than the one the parent web page was requested from and either prompting users to accept cookies or rejecting all cookies — not only could the privacy and tracking issues related to cookies go away, but improvements to many web sites would be forced, and user awareness of the tracking and privacy issues would be elevated.