How can one mathematically compute the security level of a human computable password schema

computational complexitycomputational mathematicscryptographyinformation theory

Introduction

As technology advances, cryptographers are developing improved techniques for encoding information. While these techniques are becoming incredibly efficient for computers to perform, I am in search of techniques that humans, not computers, can mentally use to encrypt information with high security (that means no $30$-digit number crunching). In this endeavour, I have come across a human computable password schema provided under the "The Schema" section below. I would like to be able to mathematically compute the security level of the schema described below. I am open to hearing about any mathematical approach to compute any reasonable security metric for the schema.

From the research I did, a lower bound for schema quality seemed like a good security metric. Furthermore, information theory seemed like a good mathematical approach to perform the computation as it seems to connect the notions of probability and randomness. I describe below, in further detail, the metric of schema quality and some additional motivation for using information theory.

While this seems like a promising approach to me, I would like to reiterate that I am open to any reasonable mathematical approach and security metric. The only condition is that it should be mathematical in nature i.e. involve writing minimal to no code.

To summarise, my question is as follows

How can one mathematically compute the security level of the human computable password schema described in the "The Schema" section below? (Any reasonable security metric and mathematical approach is acceptable.)

Definitions

(Taken from Human-Usable Password Schemas: Beyond Information-Theoretic Security)

More details about specific keywords are mentioned in the forthcoming sections.

Password Schemas

Definition:

Password schemas
are deterministic functions which map challenges (typically the website name) to responses
(passwords).

Human Computable Password Schemas

Definition:

A password schema that is implementable in a user’s head without the use of additional instruments such as a calculator or pen and paper. There is an additional bound of one hour on the total time over a user’s life required to memorize and maintain the schema.

Security Level

The following definition may be refined to better fit the intuitive notion of security.

Definition:

The level of difficulty for an adversary (hacker) to crack a random user's private key based on a number of challenge-response pairs.

Schema Quality (A potential security metric?)

Definition:

A schema is said to have quality $Q$ if an information-theoretic adversary (i.e., an adversarial Turing machine with unbounded computational power) is able to
correctly guess the response to the next challenge within ten attempts after seeing an average of $Q$$1$ random challenge-response pairs.

The Schema

The following is the schema that I wish to be able to compute a mathematical security metric for. I describe below the purpose and functioning of the schema.

The purpose of the following schema is to give a user an algorithm they can use to mentally compute a unique password for any website they visit in under $20$ seconds. The advantage of this schema is that a user does not need to remember individual passwords for each website and can instead simply remember an algorithm. To make the passwords unique for a user, each user creates and memorises a random mapping of letters to numbers, which are used, along with the website name, in the algorithm to generate the password.

enter image description here

I will use the above image to explain how the schema works.

Broadly speaking, the schema can be thought of as the following function:

$$f(\text{website name}, \text{private key}) = \text{password}$$

The website name can be any combination of the letters in the English alphabet. It can also include an integer from $0$ to $9$ at the end of it. In our example above (image), $\text{MAGIC}1$ is the website name. (Of course, all website names in the real world do not fit these constraints, but we can ignore the exceptions).

The private key is a random mapping that the user chooses. The random mapping maps every letter in the English alphabet (i.e. $\text{A}$ to $\text{Z}$) to an integer from $0$ to $9$ and each integer from $0$ to $9$ to another random integer from $0$ to $9$. In our example above (the mapping is not fully visible in the image), the private key i.e. the random mapping is as follows:

$$\underset{\lower.5em\mbox{1}}{\overset{\raise.3em\mbox{A}}{\big\downarrow}} \; \underset{\lower.5em\mbox{0}}{\overset{\raise.3em\mbox{B}}{\big\downarrow}} \; \underset{\lower.5em\mbox{0}}{\overset{\raise.3em\mbox{C}}{\big\downarrow}} \; \underset{\lower.5em\mbox{9}}{\overset{\raise.3em\mbox{D}}{\big\downarrow}} \; \underset{\lower.5em\mbox{7}}{\overset{\raise.3em\mbox{E}}{\big\downarrow}} \; \underset{\lower.5em\mbox{3}}{\overset{\raise.3em\mbox{F}}{\big\downarrow}} \; \underset{\lower.5em\mbox{2}}{\overset{\raise.3em\mbox{G}}{\big\downarrow}} \; \underset{\lower.5em\mbox{5}}{\overset{\raise.3em\mbox{H}}{\big\downarrow}} \; \underset{\lower.5em\mbox{3}}{\overset{\raise.3em\mbox{I}}{\big\downarrow}} \; \underset{\lower.5em\mbox{3}}{\overset{\raise.3em\mbox{J}}{\big\downarrow}} \; \underset{\lower.5em\mbox{7}}{\overset{\raise.3em\mbox{K}}{\big\downarrow}} \; \underset{\lower.5em\mbox{6}}{\overset{\raise.3em\mbox{L}}{\big\downarrow}} \; \underset{\lower.5em\mbox{5}}{\overset{\raise.3em\mbox{M}}{\big\downarrow}} \; \underset{\lower.5em\mbox{…}}{\overset{\raise.3em\mbox{…}}{\big\downarrow}} \quad \quad \underset{\lower.5em\mbox{1}}{\overset{\raise.3em\mbox{0}}{\big\downarrow}} \; \underset{\lower.5em\mbox{3}}{\overset{\raise.3em\mbox{1}}{\big\downarrow}} \; \underset{\lower.5em\mbox{5}}{\overset{\raise.3em\mbox{2}}{\big\downarrow}} \; \underset{\lower.5em\mbox{7}}{\overset{\raise.3em\mbox{3}}{\big\downarrow}} \; \underset{\lower.5em\mbox{9}}{\overset{\raise.3em\mbox{4}}{\big\downarrow}} \; \underset{\lower.5em\mbox{0}}{\overset{\raise.3em\mbox{5}}{\big\downarrow}} \; \underset{\lower.5em\mbox{2}}{\overset{\raise.3em\mbox{6}}{\big\downarrow}} \; \underset{\lower.5em\mbox{4}}{\overset{\raise.3em\mbox{7}}{\big\downarrow}} \; \underset{\lower.5em\mbox{6}}{\overset{\raise.3em\mbox{8}}{\big\downarrow}} \; \underset{\lower.5em\mbox{8}}{\overset{\raise.3em\mbox{9}}{\big\downarrow}}$$

To compute the password, we now follow the following steps:

$1)$ Map each letter in the website name to its corresponding number in the mapping ($\text{MAGIC}1 \longrightarrow 512301$). Let the resulting number be $X$.

$2)$ Find the sum$\bmod{10}\,$ of the first $2$ digits of $X$ and map the resulting sum to its corresponding number in the mapping. This will be the first digit of the password. Then find the sum$\bmod{10}\,$ of the first digit of the password and the second digit of $X$ and map the resulting sum to its corresponding number in the mapping. This will be the second digit of the password. Now continue the process of finding the sum$\bmod{10}\,$ of the $(n-1)$th digit of the password and the $n$th digit of $X$, mapping the resulting sum to its corresponding number in the mapping and storing it as the $n$th digit of the password, until the last digit of $X$ is reached. This process is illustrated for the above example below.

$$X = 512301$$
$$\text{first 2 digits of} \; X = 5, 1$$
$$(5 + 1) \bmod{10} = 6 \; \xrightarrow{\text{mapping}} \; 2$$
$$\therefore \text{2 is the first digit of the password}$$

The process now continues as follows:

\begin{array}{|c|c|c|c|c|}
\hline
n & (n-1)\text{th digit of the password} & n\text{th digit of} \; X & \text{sum} \bmod{10} & n\text{th digit of the password}\\
\hline
2 & 2 & 1 & (2 + 1) \bmod{10} = 3 & 3 \; \xrightarrow{\text{mapping}} \; \color{red}7\\
\hline
3 & \color{red}7 & 2 & (\color{red}7 + 2) \bmod{10} = 9 & 9 \; \xrightarrow{\text{mapping}} \; \color{green}8\\
\hline
4 & \color{green}8 & 3 & (\color{green}8 + 3) \bmod{10} = 1 & 1 \; \xrightarrow{\text{mapping}} \; \color{blue}3\\
\hline
5 & \color{blue}3 & 0 & (\color{blue}3 + 0) \bmod{10} = 3 & 3 \; \xrightarrow{\text{mapping}} \; \color{brown}7\\
\hline
6 & \color{brown}7 & 1 & (\color{brown}7 + 1) \bmod{10} = 8 & 8 \; \xrightarrow{\text{mapping}} \; \color{purple}6\\
\end{array}

Hence, we have

$$f(\text{MAGIC}1, \text{chosen mapping}) = 278376$$

(Above, $\text{MAGIC}1$ is the challenge and $278376$ is the corresponding password. Thus, $\text{MAGIC}1$$278376$ is a challenge-response pair.)

The schema can also be represented by the following recursive formula:

$$d_n = \sigma((d_{n-1} + X_{n}) \bmod{10}), \; 2 \le n \le \text{number of digits of} \; X$$
$$\text{where}$$

\begin{align}
d_n &= n\text{th digit of the password}\\
X_{n} &= n\text{th digit of} \; X\\
\sigma(x) &= \text{the result of applying the mapping on} \; x\\
d_1 &= \sigma((X_1 + X_2) \bmod{10})
\end{align}

The schema described is pretty easy for a user to use mentally once the mapping is memorised, which makes it human computable. Assessing its security level, however, seems less straightforward.

Additional References: Videos: USENIX Enigma 2017 — Human Computation with an Application to Passwords, "Human Computable Passwords" (CRCS Lunch Seminar, Jeremiah Block); Research Papers: Publishable Humanly Usable Secure Password Creation Schemas$^{*}$, Towards Human Computable Passwords

Motivation for using information theory

(As mentioned in the introduction, the answer does not have to use information theory. It is simply a potential approach. However, the answer must be mathematical in nature.)

Professor Manuel Blum mentions on slide $23$ of his lecture Human Computation that

A human calculator is so much slower than a computer that any proof that a computer cannot break a password schema must generally be information theoretic.

enter image description here

Additional Comment: In order to assess the security level, one may need to consider a specific algorithm that a hacker may use to try to crack the private key. If this is the case, feel free to consider any potent hacking algorithm.

Best Answer

I will teach you a heuristic for how to compute this, that is likely to get you a very accurate result.

I will assume that the mapping from digits to digits is required to be bijective. Then there are $10^{26}\times 10!$ possible private keys. This means that a private key has $\lg(10^{26}\times 10!) \approx 108$ bits of entropy.

Suppose we see a challenge-response pair, i.e., a website name and the corresponding password. Suppose that the website name is $\ell$ characters long. Then the response is also $\ell$ characters long. There are $10^\ell$ possible responses, so the response conveys $\lg(10^\ell) \approx 3.3 \ell$ bits of entropy.

Suppose we see multiple challenge-response pairs, and let $\ell$ be the sum of the lengths of the website names. Then the responses convey $3.3 \ell$ bits of entropy.

This means that (here comes the heuristic) after seeing these challenge-response pairs, there remains about $108-3.3\ell$ bits of entropy in the private key. In other words, given the information provided by the responses, only $108-3.3\ell$ bits of entropy remain in the private key. (Technically speaking, this is the conditional entropy, conditioned on the responses.) So if $3.3\ell \ge 108$, then we can expect that the private key is uniquely determined, or approximately so.

More broadly, if $108-3.3\ell \le \lg(10)$, then there are fewer than $\lg(10)$ bits of entropy remaining in the private key. This crudely corresponds (here is another heuristic) to having a good chance of guessing the private key within 10 guesses.

Solving the inequality $108-3.3\ell \le \lg(10)$ for $\ell$ yields $\ell \ge 32$.

So, we can now answer your question. If we observe several challenge-response pairs where the sum of the lengths of the website names is at least 32 characters, then we will likely be able to correctly guess the response to the next challenge (given ten guesses). The more it exceeds 32, the more likely we are to be correct. The chances of error diminish exponentially as $\ell$ gets above 32, so it doesn't need to be much above 32 to have a nearly-certain chance of guessing the next response correctly.

As I've noted, I've used some heuristics here that are not technically speaking accurate. However, they're typically pretty good, and I think they're a fine estimation in your case.

Also, please note that the analysis above is all on the average case. In the worst case, your scheme might be horrible. For instance, suppose I observe the response for website name AMAZOQ, and I next want to predict the response for website name AMAZON. Then with 10 guesses I can be certain to get one of them right, as the response for AMAZON will be the same as the response for AMAZOQ in all but the last digit.

Related Question