The Billion Dollar Session Cookie
Marissa Mayer is the VP of geographic and local services at Google. When speaking at the Velocity 2009 conference, she mentioned what was since described as “The Billion Dollar HTML Tag”. The presented problem was that with Google’s vast user base, an inefficiently implemented landing page feature might cost them millions of dollars of lost revenue somewhere down the line.
Change in perception is sometimes needed, to recognize something trivial as something you need to pay attention to.
Regarding session cookies - people don’t pay attention to creation, storage and disposal of session cookies. Does your web page need sessions for every user, or do you only need it for the users that log into your page (public service vs. member service).
We had a plethora of problems during the years, some of us still plague our outdated projects. Do you know, how much bandwidth you waste on session cookies? It quickly adds up to multiple GB’s if you don’t cache many resources.
Session cookies take your bandwith
A PHP session cookie typically takes about 40 bytes. If you take a page like igre123.com for example, they rank in about 20 million requests per month. Lets do some math:
40 bytes * 20 million pageloads = ~762 MB
And then we note, that the landing page, and most sub pages have anywhere from 30 to 70 static resources loaded from the same domain (let’s go with 60 on average):
40 bytes * 60 requests * 20 million pageloads = 45 GB
So, we have about 45 GB of overhead, which serves no purpose. Given browser caching, this is reduced down by a big margin, but given a cold-start scenario, these are the valid numbers.
A common solution to reduce waste because of additional cookies by moving the static content to a sub-domain. This way, you have the session cookie on the main domain, but serve those 70 static resources from a sub-domain which doesn’t have a session cookie. This translates back into the following overhead:
40 bytes * 20 million pageloads = ~762 MB
60 resources * 17 bytes overhead (shortest prefix for links: “//img.igre123.com”) * 20 million pageloads = ~19.45 GB.
So, you saved about half on cookies and shifted the problem into the content (which also means slower rendering times on the users browser).
But, looking at the cookies more carefully, If you use Google Analytics, you have about 200 bytes of cookies added to your complete top level domain.
This means, that the 200 byte cookies are sent to www.igre123.com as well as img.igre123.com - this adds another:
60 resources * 200 bytes for Google Analytics cookies * 20 million pageloads = 228.8 GB!
In this case it’s a good thing the browsers cache most static resources (also depends on your server settings!), this would really be a painful amount of data to waste.
Cookie-free domains / CDN’s
Another very good solution is to get a short domain. The shortest ones available are in the form “xx.yy” (5 characters). If you use this domain you automatically have a shorter link prefix (min 8 bytes) and a domain which doesnt have any of your session or google analytics cookies. This translates into the most optimal solution:
40 bytes * 20 million pageloads = ~762 MB.
60 resources * 8 bytes overhead (shortest prefix for links: “//xx.yy/”) * 20 million pageloads = ~9.1 GB.
I can only suggest this, even with using the same physical hardware, your network will be better utilized, not wasting bandwidth on not needed cookie data.
What to watch for?
Session storage also has it’s pitfalls. Be sure to eliminate web crawlers in such a way they don’t create new sessions in your system. It’s a good way to check if the client has any cookies, if it doesn’t and you have Google Analytics on your web site, there’s a good chance you are getting a request from a web crawler.
The easiest way to eliminate all web crawlers from the equation is to only create the session after a successful log-in into your system. This way you save on session storage and also avoid session concurrency because of reduced load on your session storage system.
These changes might seem hard at first, considering existing content, but in the end with proper planning can save you invaluable system resources ranging from bandwith, ram, cpu and disk space and even the bandwith bill if you are straddled with a pay-as-you-go ISP. With a little forethought in this area, most problems can be solved before they occur.
What about plan B?
Well, you always have the option to offload this static content to an off-site CDN provider. Among the most popular ones is Amazon S3 cloud storage, with it’s pay-as-you-go pricing scheme. This is used by many big players like Tumblr, Twitter, Foursquare and others.
This way you can offload your bandwidth bill and storage costs, and also keep technical staff minimal and more focused on your core systems. It is much harder to scale an application itself, than scaling content delivery.
- Tit Petric
igre123.com since the time of writing made some improvements, so the actual data in this article might not apply anymore, however the principles and conclusions in this article remain the same, and should be considered best practice for anyone with overstretched system resources and a flare for optimizing their operation.
While I have you here...
It would be great if you buy one of my books:
Want to stay up to date with new posts?