Last year I subscribed to the DataGenetics blog — I have no recollection where I found it, but it’s in my feed. It’s written by a “rocket scientist” so it must be good! <3 <3 The posts give you an interesting take on a security, maths, or language problem -- but the maths is disguised in an everyday example that actually manages to capture my attention. His stories often make me wonder, why have I never asked that question? I like it when authors achieve that. Here's an example: PIN number analysis. The post starts out with some jokes about a recent news story of leaked credit card PINs.
Email headline: “All credit card PIN numbers in the World leaked!”
The body of the message simply said “0000 0001 0002 0003 0004…”
Aha, we get the point: Four-digit numbers themselves are easily generated and well-known. Knowing which number belongs to whom is the catch!
The second joke is an XKCD comic (yay!) about a vandal who fools the police with a vanity license plate that reads “1I1 III1”. In the end however, the unique license plate is what the eye-witnesses remember him by, and the police identifies him easily. (The post misses the hidden tooltip punchline, that reveals that the vandal’s friend simply got a similar plate and committed several crimes — which are now blamed on him!)
This story, too, is about linking a number to a person. A green camouflage works, unless you are the only green thing in sight. Similarly, hiding behind the license plate backfired because it stood out as the most difficult one to read, which made it unique (and thus findable) among all the other numbers that had “more entropy”, more randomness.
Now the maths starts. He wonders out loud some PINs are more common than others, and if yes, how common, and why. In this particular case, I know why I never asked that question: My bank doesn’t let me change my PIN! I have to learn the random one by heart that they generated for me. Apparently, in other countries, you pick your own PIN. Unsurprisingly, an analysis of common leaked PINs shows that one quarter of all user-selected PINs are one of the same 20 easy-to-guess patterns (1234, 4321, 1111, 2222, etc). Over 10% of people choose 1234!
On the other hand, scrolling to the end of the list now and purposefully picking the least often used number is not recommended either. Crackers read security blogs too, and they adjust their strategies immediately.
What about the other quite common ones? The author finds proof that years (19xx, 20xx), dates, and sequences that form nice patterns on a numeric keypad, are significantly more common. Check out the article to see the awesome heat map visualization, and his interpretation of the striking patterns that occur. Approaches like these are what makes maths and statistics really interesting.
He also analyses common longer PINs, and finds that the chosen numeric patterns become less imaginative the longer the PIN is (123456, 3141592654 (Pi), 1357924680 (odd-even), etc). Clearly, people have a hard time making up, or remembering, complex passcodes. Forcing us to use “more secure” passcodes (longer numbers, or more complex passwords with special characters) is actually counterproductive, because humans then intuitively take the same shortcuts to decrease complexity, which makes the passcodes again easier to guess.
You can tell that we need to find a compromise between hard-to-crack by brute-force attack on one side, and easy to remember and type on the other.
The post ends with another fitting XKCD quote:
We have trained everyone to use passwords that are hard for humans to remember, but easy for computers to guess.
XKCD proposes a fun solution: using very long sentences as passphrases that are easy to type (dictionary words with no special characters), but they must be unique (i.e. not a quote) and nonsensically absurd, which as a side-effect also makes them more easily memorable. Unfortunately, most accounts (other than UNIX and old mainframe security systems) have quiet a low maximum password length and require special characters, which prevents you from actually following this advice in real life…
In any case, just one example to get you interested in DataGenetics, check it out!