On Generators of Diffie-Hellman-Groups

Abstract: Many cryptographic applications are using bad groups for Diffie-Hellman-protocols: Either the prime is not a safe-prime or the generator doesn’t generate a prime-order-subgroup. The article explains the issues in detail and shows that they can cause terrible vulnerabilities. After that it provides a short guideline how to avoid the issues (use safe-primes and squares). Finally it demonstrates that even some famous libraries like OpenSSL are affected.

Disclaimer: Cryptography is a minefield, so beware of possible mistakes in the following (and any other) article. That said, the attack that this article is about is very real.

Note: This page uses MathML (which has been standardized for almost a quarter of a century by now and has furthermore been part of HTML5 from the very start almost 1½ decades ago (as of 2022)). Some legacy-browsers (most notably Google Chrome) sadly don’t care about standards and still prefer to only show the fallback LaTeX-formulas in place of actual formulas. If this is the case for you, simply switch to a somewhat modern browser like Firefox.

Since I’ve recently found quite a bit of misinformation on a certain topic, even in places where you really, really do not want to find it, I’ve figured that I should write a bit about it. The topic in question is “How to choose a group and a generator for DLog/CDH/DDH-based Cryptography?”. Given that the most widely used cryptographic primitives (with exception of RSA-based-stuff) fall within that category this is certainly important. I should however note that the state-of-the-art for them is to use elliptic curves, which kind of avoid these issues since you would normally use standardized groups instead of generating them yourself.

Before we start into looking into the issue it might be a good idea to reiterate how these schemes work. We will start with a very naive and insecure version of the Diffie-Hellman-Key-Exchange (DH-KX) and improve it from there, After that we will take a short look at ElGamal-Encryption.

Diffie-Hellman-Key-Exchange

In the beginning we need to choose a group

𝔾

. For this we will take some prime-number

p

and look at the multiplicative group of its associated residue-class. (Sorry for the math-terms.)

To give a specific example let’s look at

p = 19

. This means the group we would work with would be

ℤ_{19}^×

. It looks like this:

The naive and insecure DH-KX now takes a generator

g

from that group (a generator is an element that generates the entire group, meaning that its powers are all the group-elements). In our specific example we might use

g = 2

. After that both Alice and Bob pick a random number between 2 and

p−1 = 17

, for example

a = 6

and

b = 11

Then they compute their public information as

A = g^a = 2^6 = 7

and

B = g^b = 2^{11} = 15

and send it to each other.

These will always be the same because:

A^b = \left(g^a\right)^b = g^{ab} = g^{ba} = \left(g^b\right)^a = B^a

The idea is that

g^{ab}

can now serve as a shared secret between Alice and Bob that they will use to encrypt their communication with a symmetric cipher. But is it safe?

Small-Subgroup-Attacks

If we look at

A

and check how many elements it generates, the number we find is 3: 1 (trivial), itself (also trivial) and 11. Let’s take the role of Eve (the attacker) at this point: We don’t know what the secrets of Alice and Bob are, but it is relatively easy to figure out if either

A

and

B

only generate very few elements. Since this is the case for

A

and since the shared secret must be generated from

A

(and

B

) we know that we only need to search in the subgroup generated by each, allowing us to vastly reduce the number of possible shared secrets. In the end we can than just run a brute-force-attack against the remaining ones.

This is a simple case of a so called Small-Subgroup-Attack and the danger of one always exists if

𝔾

has small subgroups. To protect against it, we need to figure out when such subgroups exist. For this we will start with the following theorem:

In English this means that for every size, there really is only one group with a generator. All different representations only rename the elements. (In cryptography we hope that figuring out how to undo the renaming is a hard problem.) What follows is that we can simply look at additive groups if we want to figure out which elements are generators.

It turns out that all elements that are neither the neutral element (“0” in this case since we are now looking at additive groups) nor share a nontrivial divisor with the group-size (GCD = 1) will generate the entire group. In case of Groups with prime-order this is true for all elements except 0.

The problem here is that we are not using the additive, but the multiplicative group that happens to not contain

0

, since there is no element

x

in the group such that

0 ⋅ x = 1

. This leaves us with

p - 1

elements. Unless

p = 2

(From now on I will ignore these corner-cases which only appear for very small primes), this means that the group order does have two divisors that it can share with some of its elements: 2 and

\frac{p−1}{2} ≕ q

. This means that we cannot simply take any prime-number and any generator, since there definitely are small subgroups.

To figure out a solution we need another property of those groups: If

r

is a prime-divisor of the group-order, then there will be a subgroup that contains exactly

r−1

elements that generate that subgroup. Together with the neutral-element, this gives us a group of order

r

1 is somewhat special in that it will always divide the group-size, but there is still one element that generates its entire subgroup, namely 1 itself.

In our setting 2 will always be a prime-factor of

p−1

, therefore there will be a subgroup with two elements, that don’t generate anything besides themselves. Obviously the neutral element (“1”) is in that subgroup, the other element is “−1” (in our case:

19 - 1 = 18

If the size of

|G|

is sufficiently large, the probability of running into either of those cases is so small, that we usually don’t care about it. (It is about as low as the chance of guessing the secret-key derived from it in the first try.)

This means that it is crucial to get the factorization of

q

, if we want to get the full picture. If

q

has many small prime-factors, we are clearly in the realm of small-subgroup-attacks. In the case of

p = 19

this is true:

p - 1 = 18 = 2 ⋅ 3 ⋅ 3

q

is however a prime-number, this means that there will be a subgroup with

q

elements of order that generate exactly that subgroup. While these only make up halve the group, its no problem if we end up in it, since it is still huge if

p

is sufficiently large. In contrast to the original group it does however have the advantage that all its elements (except “1”) generate the entire group. This means aside from the trivial subgroup (the one that only contains “1”) there are no subgroups that can cause problems. For this reason we call a prime

p

with that property (

p = 2q + 1

q

prime) “safe-primes”; their smaller counterparts

q

are called Sophie-Germain-primes. The completely overused textbook-example for a safe-prime is 23 (

= 2⋅ 11 + 1

), since it has just the right size to get an idea for how its residue-classes behave. On an unrelated side note, it might be interesting that it is also a Sophie-Germain-prime, since 47 is prime as well.

The Diffie-Hellman-Assumptions

At this point we should define clearly the first security-assumption that we want to use, the so called Computational-Diffie-Hellman-Assumption (CDH):

As of now, there are no feasible pre-quantum attacks on that assumption for groups with the just described structure (

p = 2q + 1

p

and

q

are prime-numbers) if

g ≠ 1

and

g ≠ −1

and

p

is large enough. This leads a shocking amount of people to the following conclusion:

If you throw the result through a Hash-Function that you model as Random-Oracle (and if you ignore all the cryptographers that hate the Random-Oracle-Model), this might be the case. If you don’t use a hash-function, the situation gets more complicated: In practice with real-world ciphers, this will realistically speaking also work, but if you have issues in other places as well, things might start breaking a lot sooner then you might expect. (I can for example construct an (admittedly artificial) combination of an IND-CPA-secure cipher and an OW-CPA-secure key-exchange that have a plaintext-extraction-vulnerability, where the attack is always successful and runs in constant time.)

To get a better idea for the real issue (and to get rid of the symmetric-black-box), let’s look at the ElGamal-encryption-scheme. ElGamal is a scheme that has many very nice properties:

In essence this is a DH-KX where we just multiply the plaintext to the shared key. Decryption is then done by dividing (strictly speaking multiplying with the “inverse” is not division, but it is close enough for this article) that ciphertext by the shared key. (Imagine it as a regular DH-KX where we use a one-time-pad for the symmetric part.)

It is easy to proof that this is a secure encryption-scheme, if we add another assumption into the mix, the so called Decisional-Diffie-Hellman-Assumption (DDH):

This is a stronger assumption than CDH, which means that it is possible for it to be wrong when CDH is true, but not the other way around.

I stated previously that we assume the CDH-assumption for our group, which of course now asks for the question whether the DDH-assumption also holds for it. The short answer is no, the long answer is that there is an efficient attack.

Breaking Decisional-Diffie-Hellman

It is possible to figure out the order of all group-elements¹, which will be either

q

2q

(let’s ignore 1 and −1 for now). The elements with order

q

do in fact have a name: quadratic residues or just squares. The later term already points into the right direction for how they are defined: A group element

h

is a square if there is another group-element

h′

such that

h = h′^2

. This is obviously true for all numbers that are squares when viewed as natural numbers, but since we are looking at residues, there are quite a few more. To give a specific example: When looking at

ℤ_{23}^{*}

, then 2 is a square because

5^2 = 25 = 2

(mod 23).

Let

g

for now be a generator of the entire group. In this case an element

h ≕ g^x

is a square if and only if

x

is an even number:

g^{2y} = \left(g^y\right)^2

. Since half the possible exponents are even, this shows that exactly half the elements of the group are squares.

These are four of eight cases, making up exactly 50% of the challenges. We can now construct the following attacker against DDH:

The chance of being right for this attacker is

0.5⋅1 + 0.5⋅0.5 = 0.75

which is substantially above

0.5

. Therefor the DDH-assumption is wrong for this group.

The impact of this on ElGamal is subtle but potentially critical: While it is still not possible to figure out the whole plaintext, an attacker has a good chance to learn whether the plaintext is a square. From a theoretical perspective this is completely unacceptable since it means that El-Gamal’s formal security guarantee drops from IND-CPA (INDistinguishability under Chosen-Plaintext-Attacks) to OW-CPA (One-Way under Chosen-Plaintext-Attacks). Where the first is perfectly fine for a building-block in many applications², the later is pretty much useless.

Sometimes you get the feeling that this is what the industry is actually thinking, when you look past crypto-disasters. In order to convince you that the problem is real and potentially devastating, I will present to you a protocol that is provably secure if you use a group in which DDH is hard, but breaks completely if this is not the case.

Breaking a provably secure voting-scheme

The following protocol is a simplified version of a voting-scheme that Cramer, Gennaro and Schoenmakers introduced. While I stripped almost everything that distracts from the attack and note that while many of these things are required for security, they wouldn’t solve the issue that this article is about.

To be fair: The reasons why I like this algorithm so much has a lot to do with how easy it is to fix all the problems that this version doesn’t cover, but doing so would distract from the important point: Assume that the generator is chosen badly, aka it generates a group in which it is possible to distinguish squares and non-squares. Assume further that the public-key is a non-square. We now have four cases for the users randomness and vote:

This is pretty much the absolute worst-case: Every single vote is public and can be identified immediately.

I want to point out one further thing about this protocol: Unlike many counterexamples that you encounter in cryptography, this one was not designed to be broken and doesn’t do anything out of the ordinary. Nonetheless it breaks down in one of the worst ways possible if you make the apparently innocent change of picking a bad generator.

Counter-Measures

If you do what I strongly recommend you to do and use a safe-prime, you can use 4 as a generator. Since it is a square in the natural numbers, it is a square in all residue-classes that contain it as well. There are other ways as well, but this is the easiest and most fool-proof one.

If for some reason you did not pick a safe-prime, figure out the largest prime-divisor

q

p−1

. It must be sufficiently large to be secure against brute-force and generic attacks (256 Bit is about the minimum you should use). Then pick a random element

h

from the entire group and compute

g ≔ h^{\frac{p−1}{q}}

. Verify that

g ≠ 1

and if necessary try new elements until it is.

g

will now be a generator of the subgroup with

q

elements. This is less trivial then the case for the safe-primes and encoding your messages as group-elements will definitely be more complicated as well (I’m however not going to cover the reason here), but it is at least better then the alternatives.

OpenSSL and LibreSSL

A post like this would be only halve as interesting, if there were no widely used pieces of software that are affected by the problem. Unluckily I can inform you that the most widely used software is in fact affected. (I guess, that those who just want to watch the world burn, might consider it lucky.)

The problem is not just, that these libraries do it wrong and may occasionally pick bad generators, they actually go out of their way to make sure that the generator is bad. The following is pasted directly from the code:

/*-
 * We generate DH parameters as follows
 * find a prime q which is prime_len/2 bits long.
 * p=(2*q)+1 or (p-1)/2 = q
 * For this case, g is a generator if
 * g^((p-1)/q) mod p != 1 for values of q which are the factors of p-1.
 * Since the factors of p-1 are q and 2, we just need to check
 * g^2 mod p != 1 and g^q mod p != 1.
 *
 * Having said all that,
 * there is another special case method for the generators 2, 3 and 5.
 * for 2, p mod 24 == 11
 * for 3, p mod 12 == 5  <<<<< does not work for safe primes.
 * for 5, p mod 10 == 3 or 7
 *
 * Thanks to Phil Karn for the pointers about the
 * special generators and for answering some of my questions.
 *
 * I've implemented the second simple method :-).
 * Since DH should be using a safe prime (both p and q are prime),
 * this generator function can take a very very long time to run.
 */
/*
 * Actually there is no reason to insist that 'generator' be a generator.
 * It's just as OK (and in some sense better) to use a generator of the
 * order-q subgroup.
 */

The second comment was added 15 years ago, but nobody bothered to fix it since then.

In summary: If you use more than just plain DH-KX, check that your library does the right thing! [/Update]

Acknowledgements

I would like to thank Arshad Noor for pointing out an error in the calculations of the first example.

Let $p$ and $q$ again be a safe-prime and its associated Sophie-Germain-Prime and $g$ the element in question. Then $g^q$ will have the value 1 if it generates a group of order $q$ , since all the elements will repeat after this point and since $g^0 = 1$ . If $g$ generates a group of order $p−1$ , there cannot be a $1$ since that would imply that the group already repeats and therefore has only order $q$ ; in reality the value will always be $p−1$ or simply put $−1$ (we are looking at residues!), since its square must be one.↩︎
It is usually not acceptable as is for general-purpose-encryption, since it doesn’t prevent tempering with the plaintext. In general you will want IND-CCA2 (INDistinguishability under adaptive chosen-ciphertext-attacks) which is a strictly stronger guarantee. IND-CPA is still useful however since it allows you to do many things that are by definition impossible with IND-CCA2-secure ciphers. These things are however clear-cut cases of experts-only. Finally you can easily make every IND-CPA-secure cipher IND-CCA2-secure if you add an appropriate signature to it.↩︎
This is one of the simplifications. In the real protocol there is a relatively straightforward way to replace the authority with a group of people such that every single one of those would have to work together to attack the privacy of the vote and that an absolute majority would be necessary to prevent the publication of the result once the group has a chance to know the result. Finally there are ways to ensure that even if all members of the authority work together they cannot fake the result.↩︎
In reality they also have to attach a proof that they voted either yes or no, since it would otherwise be possible to vote multiple times with a single ciphertext.↩︎