NoScript all the things
Tantek Celik wrote a controversial post about the JS problem. He suggests web apps should offer information publicly instead of hiding it in a walled garden. Hoarding information behind a walled garden that is not publicly accessible by crawlers and researchers, is counter intuitive to what the web stands for. The standards in place for web documents like HTML and JSON are rarely used, and proprietary formats are seldom 'curlable' by machines. I want documents scraped in the usual sense of using cURL to grab a document.
Rich browsing is poor browsing too
Stay safe out there
Fingerprinting is a complicated subject but I think it is the wrong word we are using. We are using the term "fingerprinting" to suggest that we can in fact be fingerprinted. Tracking is a much more suitable term, because you can be tracked and not identified. "Fingerprinting" assumes the worst case scenario; that of actually being identified. When you surf the web, you are going to be tracked. If you disable JS, there are still raw Apache logs you have to contend with which reveal a great deal about you, i.e; what IP you are using, what Useragent you are using, and also when you accessed a site. (Always try to download webpages and read them at a later stage). On top of those, there is the issue of plaintext sent down the wire at many different data islands you were not even aware of. To further complicate things, there is the potential for the integrity of a browsing session to be compromised by man-in-the-middle attacks.
You will be, can, and are tracked/dragnetted whatever which way you decide to run from the issue, or conceal your behaviour! I am not the first to educate web users on what fingerprinting is: it is a way to niche a user of the web and isolate specific individuals. Individuals who have inadvertently went out of their way to niche themselves. If you think about that for a moment, and backtrack how a user would niche themselves accidentally:
- They bought a laptop on sale. Niches the O.S down to ~1000 laptop models in that area with specific Operating Systems installed.
- They subscribe to an expensive Internet Service Provider. Niches them to specific internet exchanges in an area.
- They use the same machine every single day and don't spread themselves thinly across different devices, or attempt to unbundle their computing. Huge niche issue.
- They visit the same websites out of pure habit, and just this alone is enough to fingerprint them. They don't surf the web, instead choosing to be loyal to a few large websites.
- They don't practice Internet hygiene and refuse to routinely clear their browsing history because of the faster access to sites in the address bar, or better access to assets via a local cache.
There are some obvious solutions to the above like spreading oneself thinly across many devices, using several ISP providers (using 3G, 4G, dialup, free wifi, and home broadband at random intervals), surfing the web in private sessions, using a mix of TOR, VPNs, proxies, and wifi-hotspots. When you are 'online' you are really just a node on the network and discoverable by every other node on the network. By virtue you can then be tracked, attacked, niched, and ultimately: fingerprinted. It depends on how well you are versed in what fingerprinting actually means. If you knew what it meant, you would not want to be fingerprinted, and would simply opt out.
Increase the sample size
It seems that by increasing the sample size for a computer in a network then we can afford to blend in and look like an ordinary user. In other words, the more segments a tracker can learn of, the better the 'hash' of your identity. Ideally the hash has to be the same for a huge sample of users. So, if your browsing habits matched the browsing habits of say, 1000 people, then a tracker would find it hard zooming down to a specific person. But a sample size of 1000 users is too small, and ideally we are looking for a sample size that matches the amount of users using the Internet itself, which is unrealistic. For now our best bet is to browse in the largest sample size we can find. Currently that number varies from country to country, region to region, and from user to user. Unless some drastic measures are taken to pool web users into a huge sample size, like creating a 'Manhattan Project' for the web, or making devices super cheap, we are still stuck in antiquity. Noteworthy: