# Some topics in number theory

This post comes from the journey when I tried to find a deterministic hash function that maps different natural numbers (for instance, an ID in a database) to different strings with the shortest length possible and an absolute zero possibility of collision. Eventually, I found a perfect answer for this, which involves several topics in number theory (prime number, totient function, Fermat little theory, elliptic curve, discrete logarithm, etc.). I will summarize the related topics in this post.

# Terminology and notation

Prime number ( $$p \in \mathbb{P}$$ ): numbers that can only be divided by itself. Non-prime integers are called composite.

Semiprime number: $$pq$$ for $$p, q\in \mathbb{P}$$

Co-prime integers: these statements are equivalent

• $$gcd(a, b) = 1$$ ($$gcd()$$: greatest common divisor function)
• $$a$$ and $$b$$ are coprime, relatively prime, or, mutually prime
• $$a$$ is prime to $$b$$, or, $$a$$ is coprime with $$b$$, and versa vice

Modular arithmetic (congruences): $$a$$ and $$b$$ are congruent modulo $$n$$ iff $$n$$ is a divisor of their difference. We write:

$a \equiv b \pmod{n}$

Euler's totient function $$\varphi(n)$$: number of positive integers up to $$n$$ that are coprime to $$n$$. For $$p, q, p_i\in \mathbb{P}, k,k_i \in \mathbb{N}$$, we have:

• $$\varphi(p) = p - 1$$
• $$\varphi(pq) = (p - 1)(q - 1)$$
• $$\varphi(p^k) = p^{k - 1}(p - 1)$$
• $$\varphi(p_1^{k_1}p_2^{k_2}\cdots) = p_1^{k_1 - 1}(p_1 - 1)p_2^{k_2 - 1}(p_2 - 1)\cdots$$

Ring of integer modulo $$n$$:

$\mathbb{Z}/n\mathbb{Z} = \{\bar{a}_n | a \in \mathbb{Z} \} = \{ \overline{0}_n, \overline{1}_n, \overline{2}_n, \ldots, \overline{n-1}_n \}$

where $$\overline{a}$$ is the congruence class (or, residue class) of $$a$$ modulo $$n$$: $$\overline{a} = \{a + kn | k \in \mathbb{Z}\}$$

Multiplicative group of integers modulo $$n$$, or, primitive residue classes modulo $$n$$: $$(\mathbb{Z}/n\mathbb{Z})^\times = \{m | m \in \{0,\ldots, n - 1\}, gcd(n, m) = 1\}$$. There is $$\left|(\mathbb{Z}/n\mathbb{Z})^\times\right| = \varphi(n)$$

# Fermat's little theorem

$a^p \equiv a \pmod{n} \forall p \in \mathbb{P}, \forall a \in \mathbb{N}$

Proof: Wikipage lists various proofs. Below is one of them.

We simplifies the theorem by supposing $$1 \leq a \leq p - 1$$

We have:

$$$ia \not\equiv ja \pmod{p} \forall i, j \in \{1, \ldots, p - 1\}, i \neq j \label{eq:fermat-diff}$$$

Proof: if $$ia \equiv ja \pmod{p}$$, then $$(i - j) a \equiv 0 \pmod{p}$$, then $$i - j \equiv 0 \pmod{p}$$ (because $$gcd(a, p) = 1$$), then $$i = j$$.

And we have:

$$$ka\not\equiv 0 \pmod{p}\forall k \in \{1, \ldots, p - 1\} \label{eq:fermat-nonzero}$$$

Because there are no more than $$p - 1$$ different non-zero values regarding modulo $$p$$, these $$p - 1$$ values of $$ka$$ must have the same value with $$1, 2, \ldots, p - 1$$ regarding modulo $$p$$. Hence:

$\prod_{k=1}^{p-1}{ka} \equiv \prod_{i=1}^{p-1}i \pmod{p}$

So that,

$(p-1)! a^{p-1} \equiv (p-1)! \pmod{p}$

Thus,

$a^{p-1} \equiv 1 \pmod{p}$ (because $$gcd(i, p) = 1\forall i \in \{1, \ldots, p - 1\}$$, all $$i$$ in $$(p - 1)!$$ are cancelled in both side when taking modulo $$p$$)

# Euler's theorem

A generalization form of Fermat's little theorem.

$a^{\varphi(n)} \equiv 1 \pmod{n} \forall a: gcd(a, n) = 1$

Consequence: if we can factorize a number $$n$$, we can easily calculate $$\varphi(n)$$, and infer $$a^{-1}\equiv a^{\varphi(n) - 1} \pmod{n}$$

Corollary: if $$gcd(a, n) = 1$$ and $$x \equiv y \pmod{n}$$, then $$a^x \equiv a^y \pmod{n}$$

# Fermat number

$F_n=2^{2^n} + 1$

• $$F_0 = 3, F_1 = 5, F_2 = 17, F_3 = 257, F_4 = 65,537$$ are prime numbers
• $$F_4 = 65,537$$ is the largest known Fermat prime
• Is $$F_n$$ composite $$\forall n > 4$$ is an open problem
• $$F_n \equiv 7 \pmod{10} \forall n > 1$$. So, the last digit of Fermat numbers is always 7, except $$F_0, F_1$$
• $$gcd(F_i, F_j) = 1 \forall i \neq j$$

# Mersenne Number

$M_n = 2^n - 1$

Mersenne numbers are represented by $$n$$ digits of 1, and no digit 0, in binary format.

It is well-known that if $$M_n\in \mathbb{P}$$, then $$n \in \mathbb{P}$$.

Some known prime Mersenne numbers are: $$M_2 = 3, M_3 = 7, M_5 = 31, M_7 = 127, M_{13} = 8191, M_{17} = 131,071,$$ $$M_{19} = 524,287, M_{31} = 2,147,483,647$$

More $$n$$ values for Mersenne numbers to be prime: 61, 89, 107, 127, 521, 607, 1279, 2203, 2281, 3217, 4253, 4423, 9689, 9941, 11213, etc. You can check the exhausted list here

$$M_{43,112,609}$$ is prime and has 12,978,189 digits

# Wilson's theorem

$$(n-1)!\equiv -1 \pmod{n}$$ if and only if $$n \in \mathbb{P}$$

Moreover, $$(n-1)! \equiv 0 \pmod{n}, \forall n \not\in \mathbb{P}, n \neq 4$$

# Chinese remainder theorem

For pairwise coprime integers $$n_i (i\in\{1, \ldots, k\})$$, and any integers $$a_i (i \in \{1, \ldots, k\})$$, considering the equation of $$x$$: $$x\equiv a_i \forall i\in \{1, \ldots, k\}$$.

1. There is at least one solution
2. All solutions are congruent modulo $$N = \prod_{i=1}^{k}n_i$$. In other words, the solution is unique in modulo $$N$$

Proof:

Suppose that there are $$x_1, x_2$$ satifying the requirement: $$x_1 - x_2 \equiv 0 \pmod{n_i} \forall i \in \{1, \ldots, k\}$$. Hence, $$x_1 - x_2 \equiv 0 \pmod{N}$$, because $$n_i$$ are pairwise coprime. The second statement is proven.

Each number $$x \in \{0, \ldots, N - 1\}$$ is the solution of the equation with $$a_i = x \mod n_i$$. So, there must be at least $$N$$ different equations, associated with different $$a_i$$ values, having different solutions modulo $$N$$. However, because the solution does not change if we replace any $$a_i$$ with their congruent modulo $$N$$, there are at most $$N$$ possible different combinations of $$a_i$$ to make the equation different. This means each equation must occupy at least one solution, and that solution is one we need to show its existence.

Corollary, ring of integers modulo $$N$$ and the direct product of the rings of integers modulo $$n_i$$ are isomorphic

$\mathbb{Z}/N\mathbb{Z}\cong \mathbb{Z}/n_1\mathbb{Z} \times \mathbb{Z}/n_2\mathbb{Z} \times \cdots \times \mathbb{Z}/n_k\mathbb{Z}$

Consequently, for prime power factor of integer $$n = p_1^{a_1}p_2^{a_2}p_3^{a_3}\cdots$$:

$\mathbb{Z}/n\mathbb{Z}\cong \mathbb{Z}/p_1^{a_1}\mathbb{Z} \times \mathbb{Z}/p_2^{a_2}\mathbb{Z} \times \cdots$

# Discrete logarithm and Primitive root modulo

Primitive root modulo $$n$$, or, generator of the multiplicative group of integers modulo $$n$$, is a number $$g$$ such that $$\{g^i \mod n|i\in\{0, 1, \ldots, \varphi(n) - 1\}\} = \{i | i \in \{0, 1, \ldots, n\}, gcd(i, n) = 1\}$$.

Discrete logarithm of $$a$$ to the base $$r$$ modulo $$n$$ is $$k := ind_r a \pmod{n}$$, such that $$r^k \equiv a \pmod{n}$$

If the generator $$g$$ exists, for every number $$a$$ coprime to $$n$$, there exists $$ind_g a \pmod{n}$$

For integer $$b$$ coprime to $$n$$, multiplicative order of $$b$$ modulo $$n$$ is $$ord_n(b) := \min(\{ind_b 1 \pmod{n}\} \cap \mathbb{Z}^+)$$. If $$b$$ is a primitive root, $$ord_n(b) = \varphi(n)$$

There are exactly $$0$$ or $$\varphi(\varphi(n))$$ primitive roots modulo $$n$$

The primitive root theorem: primitive root exists if and only if $$n$$ is $$1, 2, 4, p^k, 2p^k$$ where $$p$$ is odd prime, and, $$k > 0$$

From Chinese remainder theorem, for prime power factor of integer $$n = p_1^{k_1}p_2^{k_2}p_3^{k_3}\cdots$$, we imply:

$(\mathbb{Z}/n\mathbb{Z})^\times\cong (\mathbb{Z}/p_1^{a_1}\mathbb{Z})^\times \times (\mathbb{Z}/p_2^{a_2}\mathbb{Z})^\times \times \cdots$

# Diffie-Hellman key exchange

1. Alice and Bob publicly choose a prime number $$p$$ and $$g$$ which is a primitive root modulo $$p$$. Usually, $$p$$ and $$g$$ are pre-calculated and defined by protocols to be chosen. For example: see 6 Oakley Groups
2. Alice choose a secret integer $$a$$, sends Bob $$A = g^a \mod p$$
3. Bob chooses a secret integer $$b$$, sends Alice $$B = g^b \mod p$$
4. Alice computes $$s = B^a \mod p$$
5. Alice computes $$s = A^b \mod p$$
6. Alice and Bob now share a secret $$s$$

In general, multiplicative group of integers modulo $$p$$, $$(\mathbb{Z}/p\mathbb{Z})^\times = \{g^i|i\in\{0, 1, \ldots, \varphi(p)-1\}\}$$ can be replaced by any cyclic group $$G$$ of order $$n$$ with generator $$g$$. For instance, Elliptic Curve Diffie-Hellman (ECDH) is a variant that represents $$A, B$$ as points on an elliptic curve instead of as integers modulo $$p$$

Key generation

1. Choose 2 large primes $$p, q$$. Get $$n = pq$$
2. Compute $$\varphi(n) = (p - 1)(q - 1)$$. $$n$$ is public, $$\varphi(n)$$ cannnot be easily computed without knowing $$p, q$$
3. Choose $$e$$ coprime to $$\varphi(n)$$. Typical value is $$e = F_4 = 2^{2^4} + 1 = 65,537$$
4. Calculate $$d = e^{-1} \mod \varphi(n)$$ (using Extended Euclidean Algorithm)
5. Publish $$(n, e)$$ as public key, and store $$(n, d)$$ as private key

Encryption

• Convert the message to integer $$m: 0 \leq m \lt n$$
• Encode $$m$$ using public key to $$c = m^e \mod n$$

Decryption

• Decode $$c$$ by $$m = c^d \mod n$$

# Lehmer random number generator

Other names: Park–Miller random number generator, multiplicative linear congruential generator (MLCG), multiplicative congruential generator (MCG)

$X_{k+1}=aX_k \mod m$

where $$m$$ is a prime number or power of a prime number, $$a$$ is a primitive modulo of $$m$$, and, the seed $$X_0$$ is coprime to $$m$$

Some well-known choices of $$n$$ and $$a$$ are: $$(n = M_{31} = 2^{31} - 1 = 2,147,483,647, a = 7^5 = 16,807)$$, $$(n = M_{31}, p = 48,271)$$ (C++11's minstd_rand function uses this set), $$(n = F_4 = 65,537, a = 75)$$, $$(n = 2^{48}, a = 44,485,709,377,909)$$ (this case, $$n$$ is not prime number, the period is shorter, and lower bit has much shorter period because higher-order bits do not affect lower-order bits).

Back to the original problem at the beginning of this post, we can encode the integer id $$k$$ as $$X_k$$, the $$\varphi(m)$$ first values will be different. For example, if we choose $$(m, a) = (M_{31}, 7^5)$$, encode the output using alphabet characters including both upper and lower case, dropping visually confusing characters (l, o, I, O), leading to 48 characters in total. We need $$\left\lceil\frac{\log(\varphi(m))}{\log(48)}\right\rceil = 6$$ characters to encode 2,147,483,646 different IDs.

We can choose $$(m; a) = (65,537; 75)$$, divide the id by $$m$$, encode the quotient directly, and encode the remainder using multiplicative congruential generator described above. As a result, we need 3 characters for the first 65,536 ids, $$3+k$$ characters for next $$44^k * 65,536$$ ids.