Learning material for professionals eager to explore the science behind building a successful online product

#ASKTHEINDUSTRY 26: Why do people name their files with obscure characters like 592e5d1af9410ad102af931275903099?

This one is going to be interesting. In order to understand why you see so many static assets with absurd names, you need two pieces:

  • You need to understand the importance of caching assets
  • You need to understand the complexity of updating the cache

The first one is easy: the most dreadful performance bottleneck that the Web needs to overcome is the network. It is slow, unreliable and inconsistent. What’s worse, it is a necessary piece of the web stack: without the network, loading your page is just not possible. Or is it?

The most dreadful performance bottleneck that the Web needs to overcome is the network

Caching is a technique that aims at making the network inconsequential: it consists of temporarily storing assets on the client’s device, so that the browser doesn’t need to hit the network next time it needs them.

If the browser has your entire website in its cache, it can bypass the network altogether and load your website in a fast, reliable and consistent way.

This introduces a problem: how can we make sure that the browser will update its cached resources every time we update some of our files?

In other words, if the browser is storing my styles.css so that it can skip a network request, what does it happen I modify styles.css? Nothing! The browser cannot know that you update a file on your server, and the users will potentially get an outdated version of the website.

In Computer Science literature, there are so many mechanisms that aim at invalidating items in the cache smartly, most of them are complicated and present some trade off. Fortunately for us, there is an easy algorithm-free solution: since the browser stores assets in its cache by their name, we could simply change the name of the file every time we make a change. This way, the browser would not be able to find it in the cache and would fetch it from the network.

A naming system is then required, in order to make sure that we always have the power to push changes to our websites. A straightforward one would be to add a version number to the end of the filename.

For example, styles.css could become styles.v1.css. At this point, you can just bump the version number in the name, and see your changes through.

Another approach, one that guarantees that names are always unique and linked to the content of the file is hashing. Basically, a hash function is a device that takes any string of data and turns it into a (fairly) short sequence of HEX values. For instance, this entire article can be reduced to 6da4585708e12eeafa1cb7bc85b10494 (using MD5, for example)

Don’t worry though, this is a perfect task to be delegated to our machine friend. If you are into gulp, for instance, you can setup gulp-hash and forget about all that you just read!

If you’ve found this post useful at all, press the ❤ button! I read each and every comment, so be sure to contribute to the conversation with your thoughts!

Care to leave a comment? Drop down a line on the Twitters