A non-technical history of password storage

Not a very good password storage system
Not a very good password storage system

When you log in (“authenticate”) yourself to somewhere using a password, that password has to be stored, in some form, on the other side. Usually this is in a server: whether it’s Facebook, Snapchat, Yahoo, or shop-rare-trinkets.com, the server have to have a saved copy of that password. Otherwise, there would be no way to check whether the password was correct!

Checking your password without reading your password

But having your password plainly sitting on their servers like that would be a terrible idea. For one, any employee working there could then look up anyone’s password whenever they want, and a password is a terribly private thing for someone else to be able to read.

What’s more, any criminal who broke into the server would immediately know your password too! We can’t allow this to happen. So the server has to store your password, without really storing your password. How could this be done? In other words, we need to to accomplish a tough trick:

We need to be able to check whether your password is right or wrong, without being able to read your password.

Sounds hard? It really is! It’s a really difficult thing to do right, and a lot of people have screwed up in the past. Let’s look at a history of how that has been done, and what people have invented to make it possible.

We also want to look at why passwords are more and more not enough, and what we have invented (and still need to invent) to make it better.

Table of Contents

  1. Plain text
  2. Hashing
  3. Hashing + Salting
  4. Hashing + Salting + Key Stretching
  5. Hashing + Salting + Memory-hard key stretching
  6. The Future (Usable Security)

Plain text

MethodExamplesSecurity for passwordsCan be broken by
Plain text password.txt Completely insecure Someone listening
Rainbow tables
Cracking
Parallel cracking
Predictive cracking
Human error

When you were five, you probably made up this login method yourself! “Tell me the secret word to pass!” You know that this is worst method possible, because anyone would only need to overhear it once to know your password. It’s not any different on a server.

This is basically never done today, except when people make really huge mistakes, or are really bad at making secure websites and apps and need to learn a lot more.

Hashing

MethodExamplesSecurity for passwordsCan be broken by
Hashing md5, SHA-1, SHA-256 Very insecure Someone listening
Rainbow tables
Cracking
Parallel cracking
Predictive cracking
Human error

Hashing is the first way people used to solve the problem of how to “check a password without being able to read it”. Today, this method is very insecure when used on its own, but it is still used as the basic building block of more secure methods - much like how bricks are still used to build strong houses, but strong houses are never built with only a pile of bricks.

(We call these building blocks “primitives”. Primitives are used as parts to build more complex, more secure methods. Hashes are a very important primitive.)

A cryptographic hash (sometimes just “hash”) is a mathematical method that tries, as hard as it can, to do one job: if you give it an input (the password), it quickly gives you a jumble of scramble (the output) that is really hard to reverse back into the input.

Example of several hashed words.
Example of hashing in action (an algorithm called SHA-256). Notice a tiny change in the password (from "1" to "2") becomes a giant change in the hashed text.

This is sometimes compared to baking a cake. You can easily bake a cake using a recipe (6 eggs, 1.5 cups flour, …). But if someone gives you a finished cake, you would have a pretty hard time figuring out exactly what the ingredients were and how much.

Reversing a cake.
It would be pretty hard to figure out the exact ingredients of a cake just by looking at it. Figuring out a password from its hash is much, much harder.

Except a cryptographic hash is even more impressive than that. To achieve its job of being hard to reverse, its output must also change completely when the input only changes a little bit. (We call this an avalanche effect.) Why is this important?

Well, let’s think back to the cake. While you might not be able to figure out exactly how much eggs someone used in a cake recipe, you might know that more eggs roughly equals more fluffiness. So you might be able to get a hint and guess whether a cake roughly contained more eggs or less eggs. We absolutely can’t have that in a password hash!

This is really important because if someone bad gets ahold of the hashed version of your password, you want it to be as useless as possible. If someone can kinda-sorta guess what your password looks like, that’s still very bad, because of just how fast computers can guess. Any hints to narrow down what your password looks will lead to your password being guessed sooner or later.

So that means a shorter password can’t mean a shorter hash; and passwords that look similar can’t end up in hashes that look similar. The hash must be almost completely and mathematically random when you change the password. To do this, we must scramble and together the inputs a lot. This is a little bit like running your password through a shredder, and using a complicated process to recombine letters so they’re a jumble.

So we’re done, right? I mean, if cryptographic hashes already turn the password into a scramble, and this scrambling is one-way and almost impossible to reverse, then we’re pretty done, right? Isn’t the problem of being able to check passwords without being able to read it solved, by definition?

Sadly, no.

Hashing + Salting

MethodExamplesSecurity for passwordsCan be broken by
Hashing + Salting SHA-1 with salt
SHA-256 with salt
Very insecure Someone listening
Rainbow tables
Cracking
Parallel cracking
Predictive cracking
Human error

There are two major problems with using hashing by itself. First, hashes are designed to be fast. If it is fast, and passwords aren’t very long and complex, then it can be cracked just by brute-force cracking: trying all the common password combinations. This means that hashing doesn’t protect the people who happen to be using weak, short passwords.

Example of rainbow tables.
Emma and Jason have the same password hash... that must mean they have the same password! Also, they probably both used a really easy to crack password!
An actual md5 rainbow table.
This is an actual md5 rainbow table of the top 15 commonly used passwords in 2016. If hashing is all you do, which is very unsafe, then someone can just lookup the hashes of these common passwords. They also know exactly when two people used the same password.

But even more importantly, the critical problem with hashing by itself is that it doesn’t protect even a little bit the people who happen to be using common passwords, or the same password as another person. If someone got a list of all the hashes by breaking in, and found that any two hashes were the same, then those two passwords must also be the same, like password111.

That’s really terrible because then in the event of a digital break-in, thieves can make out with a list of hashed passwords, and just look up all of the the weak and easy to guess passwords in a table. Also, they can combine their efforts across different break-ins. If one thief breaks into shopping-site-1 and another into shopping-site-2, and both of them only used hashing, they can use the same look up tables on both and find both of your passwords.

So the practice of salting was invented. Salting made each password entry unique.

An actual md5 rainbow table.
Emma and Jason have the same password hash... that must mean they have the same password! Also, they probably both used a really easy to crack password!

Notice that the salt isn’t kept secret: it doesn’t need to be. It just adds a little touch of complexity if two people happened to share the same password, it is difficult to tell without trying to crack it first.

Salted hashes were the way to store passwords for a long time. So now are we done? Each password entry is now unique, and attackers can’t reverse it without a lot of effort.

Sadly, no.

Hashing + Salting + Key Stretching

MethodExamplesSecurity for passwordsCan be broken by
Hashing + Salting + Key stretching Repeated MD5 Insecure Someone listening
Rainbow tables
Cracking
Parallel cracking
Predictive cracking
Human error
Repeated HMAC-SHA256 Nonstandard practice
PBKDF2-HMAC-SHA256 Still good

Hashing + salting solved the big problem of someone figuring out, without much cracking, if people have the same password (or if you’re using the same password across different sites!) But, cracking became a bigger and bigger problem over the years: computers are just coming too fast. People became more and more easily able to crack shorter passwords that people can easily memorize, like a single word with some modifications - basically anything fewer than 10 or 12 characters.

At the time they were invented, computers weren’t particularly fast. That means that not being able to reverse the salted hashes without trying a lot of passwords was good enough security by itself. As time went on, however, and computing hardware got faster and cheaper, people were finding that more and more passwords were being cracked. The recommendation to use 6-character passwords slowly became 8 characters, and now it’s at least 10 or 12 at the minimum if you don’t want someone to crack your password in mere seconds.

A video card.
A video card is extremely good at doing lot of simple, parallel tasks, including hashing passwords.

Now, don’t get me wrong - this wouldn’t grow forever because of the rules of math, so by about 20 or so characters of random characters, you’re definitely done as far as having a good password. But that’s hard for us to remember too. How would we protect the majority of people who didn’t spend all that time and effort remembering fantastic passwords? Well, we had to find ways to slow down the hashing process. That’s basically what key stretching is: it’s basically like saying hashing + salting + slowing down.

One obvious solution - and one that’s still used - was to run the hash lots and lots of times. If you take in the input, hash it, take the output hash it again, and do that hundreds of thousands of times, and make that repeated process the only way to save the hashed password - then the attacker will have to do this hundreds of thousands of times too, to crack it.

This is a little inconvenient to use. For one, it’s slow for you to log-in, too. And it’s resource-intensive for the server to run all those repeated rounds of hashing. But because you only need to log in once in a while, and an attacker wants to try billions or trillions of passwords all at once, it becomes a lot harder and expensive for the attacker compared to you.

So now are we done?

Sadly, no. (But hang on, we’re almost there!)

Hashing + Salting + Memory-hard key stretching

MethodExamples Security for passwords Can be broken by
Hashing + Salting + Memory-hard key stretching bcrypt Good practice Someone listening
Rainbow tables
Cracking
Parallel cracking
Predictive cracking
Human error
scrypt Good practice
Argon2 Best practice

As graphics cards and dedicated chips become cheaper and more accessible because of more popularity in machine learning and virtual reality, the problem of people just paying a lot of money to rent computers in the cloud to crack passwords for them become easier and easier. The next wave of improvements are really just to fend off the next wave of cheap password cracking power because supercomputer power that would be unimaginable a decade ago can be had for only as much money as a big grocery trip.

When we say memory-hard, we are trying to use a weakness in using graphics cards and dedicated chips: they’re good at doing a lot of simple repeated work all at once (like password hashing), but they can’t easily do complicated tasks. One way to complicate thet task of hashing is to make hashing require a huge amount of data, more than would normally fit in each individual “worker unit” of a graphics card or dedicated chip.

In a way, it’s just another way of making the task of password hashing slower and more complicated so each action of testing a password in the cracking process takes more time.

The Future (Usable Security)

MethodExamples Security for passwords Can be broken by
Usable security innovations Password managers Best practice Human error
Security keys Best practice

You notice that there are still two ways you can break passwords. “Predictive cracking” and “Human error”.

We’re not very good at memorizing passwords - not many of them, not very complex ones, and not very quickly. We’re also not all that creative - we all share some amount of experiences that make up our communities and our world. We remember our best memories - names, memes, TV shows, books, and all these shared experiences can be used to predict the types of easy-to-remember passwords we’re likely to come up with. Especially with the rise of more advanced AI, this is more likely to put us more and more at risk.

The only currently defense against this is having lots of long, completely random passwords, and never use the same password in 2 different accounts. But this is not feasible to remember; no one should be expected to remember hundreds of unique totally random 24-character passwords just to keep themselves safe. Our technology need to work better for us.

More innovation is coming in this space, and it’s sorely needed. Password managers such as 1Password or KeePass, for example, is one way of reducing the work for us, by reducing the number of good long passwords we have to remember down to 1 or 2, and the use of a quality password manager is highly recommended by most security experts.

Another innovation is 2-factor authentication that protects against stealing your password with convincing fake websites. The best example of this is the U2F key, which is a USB-drive-looking device that checks to make sure you’re logging into the right website, and then talks to the website in encrypted code to prove your identity in addition to your password. Because each key is made to be unique and extremely difficult for anyone else to copy, this is an amazing second check to make sure you’re truly you. Google recently released a Protection Program, for example, for at-risk journalists and political activists based on security keys.

We will need a lot more usable security inventions like these to be widely used by the technologies we depend on in our lives to stay secure. Passwords will probably always be somewhere in our lives, but we can’t rely on them alone to protect us. After all, their security is based on the very thing that makes it hard for us to remember - random complexity.

Summary

MethodExamples Security for passwords Can be broken by
Plain text password.txt Completely insecure Anyone
Hashing md5, SHA-1, SHA-256 Very insecure Rainbow tables
Cracking
Parallel cracking
Predictive cracking
Human error
Hashing + Salting SHA-1 with salt
SHA-256 with salt
Very insecure Cracking
Parallel cracking
Predictive cracking
Human error
Hashing + Salting + Key stretching Repeated MD5 Insecure Parallel cracking
Predictive cracking
Human error
Repeated HMAC-SHA256 Nonstandard practice
PBKDF2-HMAC-SHA256 Still good
Hashing + Salting + Memory-hard key stretching bcrypt Good practice Predictive cracking
Human error
scrypt Good practice
Argon2 Best practice
Usable security innovations Password managers Best practice Human error
Security keys Best practice