Passwords and Entropy

This week a YouTube segment has been passed around. It depicts Ed Snowden talking to John Oliver about passwords. A lot of what is said there is quite good - however, it is important to also point out that things are a little bit more intricate than they seem in this segment.

The biggest problem I have with it is actually that Snowden claims to be able to know whether a password is good or not, just be hearing it. Yes, it is easy to recognize extremely bad passwords (like most examples in that segment). However, it is actually not possible to recognize a very good password just from looking at it. This is quite counterintuitive, and it has to do with the fact that the entropy in a password is NOT visible in the output - rather, the entropy in a password is a function of the password creation mechanism. Thus, you can have two passwords that look exactly the same, and one of them has high entropy, and the other has low entropy.

The concept of entropy in password generation is in general something that isn't very well understood still, and most people giving password advice or setting password guidelines actually have terrible intuitions about these things. So how do you know if your password is a good one? Well, if you made any kind of choose during the process - ANY kind of choice - my bet is that it is a low-entropy password.

I use the word entropy a lot in this post - specifically in the information theoritical sense of how much information a given string contains. Entropy applied to passwords can also be looked at in the light of how many guesses an adversary would have to try in order to brute force the password, assuming they know your method. So this is another aspect of Kerckhoffs's principle - you should choose a method that even if it's public will give you a large amount of safety. It's also worth mentioning that there is a limit on the entropy here as well - if your entropy generation method generates more entropy than can be expressed in the language of the password, then the entropy maxes out at the representation entropy, not the generation entropy. So generating an 8 character password with a process that generates a 1000 bits of entropy is quite useless, because 8 characters can only represent at a maximum 64 bits of entropy (assuming you use all 256 values of each byte).

We can take a look at the passwords given as examples in the show: The first one is "passwerd", and of course this is a terrible password. It's short and it's a permutation of a word in dictionaries. The second one, "onetwothreefour" is longer, but composed of words that belong together, thus not a problem. The third one is "limpbiscuit4eva", and since it is a sentence that contains meaning, even with the misspelling this won't help you.

The two final pass phrases are "admiralalonzoghostpenis420YOLO" and "margaretthatcheris110%SEXY". Both of them are longer, but they do have some sense in them. It is clear that the pieces of it aren't chosen randomly. Thus, these passwords are better than the earlier ones, but not at all as good as they could have been.

When people create passwords they also have a tendency to add special characters and numbers - and maybe change the order of characters or adding upper characters in places. None of these things actually add a lot of entropy, especially since they are usually added in places that are easy to predict. Just go with random words of a sufficient length to give you the security you want. Use lower case letters and don't add weird characters. Just make it a sentence of really random words.