That is one of my least favorite xkcd comics. It is very misleading.
If the attackers knows that your password is constructed in this fashion, then it is trivial to track the password, as we've restricted the search space to a multiple of the number of common English words. The entropy argument only makes sense if the human readable strings are just as likely to be chosen as passwords as other random strings, which is not at all the case.
You have completely missed the point of the comic, which is that if you choose 4 common English words at random, the entropy is surprisingly high. It isn't based on "human readable strings" at all.
For example, my /usr/share/dict/american-english contains just shy of 100,000 words. A random word chosen from that set has 16.6 bits of entropy, and four randomly chosen words has over 66 bits of entropy. If anything, XKCD's comic is understating the entropy involved.
Except when people create phrases like that they aren't choosing random words from a dictionary, they're most likely choosing words from their own vocabulary which will be significantly less than 100k words. Additionally the distribution is not uniform, reducing entropy even further.
Every password require that the user choose randomly. Words, letters, numbers, pixels on a screen... All require randomness in choosing.
This is why some websites assign passwords to users and do not allow users to pick their own custom passwords. The only safe passwords are those generated by machines.
This does not mean that picking words to form a pass-phrase is less secure than picking letters to form a password.
Less entropy means less secure. However the method in the comic is not pick a passphrase. It is pick 4 random words(hopefully with the help of a computer with a good source of randomness, so they are really random). This is because phrases have semantic meaning and reduced entropy. Also people tend to pick phrases that are common enough to be found on the internet somewhere like movie quotes, book quotes and thus are likely to be in an attackers dictionary. So four random words not a phrase has better entropy than a passphrase, and is less likely to appear in a dictionary attack than a phrase.
Right - this is the crucial point. The method suggested in the XKCD comic isn't to pick four words yourself out of your head - it's to randomly select four words from a dictionary.
Yes this is true. This is why password crackers will scrape twitter/facebook/whatever for modern slang, common mispellings, neolgisms, etc for their word lists.
I don't think it's particularly misleading; if you look carefully, he's assigning 11 bits of entropy for each word in the passphrase, in other words, choosing from a list of 2048 common words only.
This is probably quite close to what a brute force passphrase cracking software would do as well, and he's not even adding bits for common alterations, such as capitalisation of first letter(s), spaces between words, common substitutions, etc. So the 44 bits estimate is for a software matching exactly this pattern, using exactly this common English dictionary.
Also, I suspect throwing in a single word from another language would greatly increase overall strength, especially if it's an uncommon word.
That's not true. Given a dictionary of 2048 words that the attacker has complete knowledge about, picking any 4 random words will always give you 44 bits of entropy.
As a rule of thumb, English text has about one bit per character of entropy. [0, 1] Since we're going with averages, let's say 5 letters + a space for each word. So you need a 7- or 8-word sentence, with normal capitalization and punctuation, to get 42 bits of entropy. And of course it shouldn't be a well-known phrase like "I've got a bad feeling about this!"
> as we've restricted the search space to a multiple of the number of common English words
Diceware uses a set of 7776 words. You select words from the list using 5 dice. 5 words, picked using 5 rolls of the set of 5 dice, gives you about 64 bits of entropy.
> A five-word Diceware passphrase has an entropy of at least 64.6 bits; six words have 77.5 bits, seven words 90.4 bits, eight words 103 bits
Because our attacker knows that we've used Diceware, and knows what diceware wordlist we used, and knows that we've used a 5 word passphrase, there are 7776^5 phrases to try. That's 28,430,288,029,929,701,376.
> If the attackers knows that your password is constructed in this fashion
Then all bets are off, but they don't, so we're sorted.
Mind you, my /usr/share/dict has ~ 100,000 words in it. 100,000 5 is around the same order of magnitude as 62 12, which is the number of 12 character passwords of upper and lower letters + digits.
If the attackers knows that your password is constructed in this fashion, then it is trivial to track the password, as we've restricted the search space to a multiple of the number of common English words. The entropy argument only makes sense if the human readable strings are just as likely to be chosen as passwords as other random strings, which is not at all the case.