Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That is one of my least favorite xkcd comics. It is very misleading.

If the attackers knows that your password is constructed in this fashion, then it is trivial to track the password, as we've restricted the search space to a multiple of the number of common English words. The entropy argument only makes sense if the human readable strings are just as likely to be chosen as passwords as other random strings, which is not at all the case.



You have completely missed the point of the comic, which is that if you choose 4 common English words at random, the entropy is surprisingly high. It isn't based on "human readable strings" at all.

For example, my /usr/share/dict/american-english contains just shy of 100,000 words. A random word chosen from that set has 16.6 bits of entropy, and four randomly chosen words has over 66 bits of entropy. If anything, XKCD's comic is understating the entropy involved.


Except when people create phrases like that they aren't choosing random words from a dictionary, they're most likely choosing words from their own vocabulary which will be significantly less than 100k words. Additionally the distribution is not uniform, reducing entropy even further.


Every password require that the user choose randomly. Words, letters, numbers, pixels on a screen... All require randomness in choosing.

This is why some websites assign passwords to users and do not allow users to pick their own custom passwords. The only safe passwords are those generated by machines.

This does not mean that picking words to form a pass-phrase is less secure than picking letters to form a password.


I don't see what's wrong with my argument that choosing a pass phrase will have less entropy...

Does less entropy not mean less secure? Or am I just reasoning about the entropy all wrong?


Less entropy means less secure. However the method in the comic is not pick a passphrase. It is pick 4 random words(hopefully with the help of a computer with a good source of randomness, so they are really random). This is because phrases have semantic meaning and reduced entropy. Also people tend to pick phrases that are common enough to be found on the internet somewhere like movie quotes, book quotes and thus are likely to be in an attackers dictionary. So four random words not a phrase has better entropy than a passphrase, and is less likely to appear in a dictionary attack than a phrase.


Right - this is the crucial point. The method suggested in the XKCD comic isn't to pick four words yourself out of your head - it's to randomly select four words from a dictionary.


Their own vocabulary may contain words that are not in the dictionary, such as slang, intentional or accidental misspellings, etc.


Yes this is true. This is why password crackers will scrape twitter/facebook/whatever for modern slang, common mispellings, neolgisms, etc for their word lists.


I don't think it's particularly misleading; if you look carefully, he's assigning 11 bits of entropy for each word in the passphrase, in other words, choosing from a list of 2048 common words only.

This is probably quite close to what a brute force passphrase cracking software would do as well, and he's not even adding bits for common alterations, such as capitalisation of first letter(s), spaces between words, common substitutions, etc. So the 44 bits estimate is for a software matching exactly this pattern, using exactly this common English dictionary.

Also, I suspect throwing in a single word from another language would greatly increase overall strength, especially if it's an uncommon word.


That's not true. Given a dictionary of 2048 words that the attacker has complete knowledge about, picking any 4 random words will always give you 44 bits of entropy.

    2048^4 = 17592186044416
    2^44 = 17592186044416


Yes but grammatically correct sequences of words have much less entropy than that. I guess much much less.


As a rule of thumb, English text has about one bit per character of entropy. [0, 1] Since we're going with averages, let's say 5 letters + a space for each word. So you need a 7- or 8-word sentence, with normal capitalization and punctuation, to get 42 bits of entropy. And of course it shouldn't be a well-known phrase like "I've got a bad feeling about this!"

[0] The original http://www.princeton.edu/~wbialek/rome/refs/shannon_51.pdf

[1] and some evidence that it's still correct http://en.wikipedia.org/wiki/Hutter_Prize


Thanks, great info.


> as we've restricted the search space to a multiple of the number of common English words

Diceware uses a set of 7776 words. You select words from the list using 5 dice. 5 words, picked using 5 rolls of the set of 5 dice, gives you about 64 bits of entropy.

> A five-word Diceware passphrase has an entropy of at least 64.6 bits; six words have 77.5 bits, seven words 90.4 bits, eight words 103 bits

Because our attacker knows that we've used Diceware, and knows what diceware wordlist we used, and knows that we've used a 5 word passphrase, there are 7776^5 phrases to try. That's 28,430,288,029,929,701,376.

http://world.std.com/~reinhold/dicewarefaq.html

I'd be interested if you think Diceware is broken.


> If the attackers knows that your password is constructed in this fashion

Then all bets are off, but they don't, so we're sorted.

Mind you, my /usr/share/dict has ~ 100,000 words in it. 100,000 5 is around the same order of magnitude as 62 12, which is the number of 12 character passwords of upper and lower letters + digits.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: