Some topics in number theory
This post comes from the journey when I tried to find a deterministic hash function that maps different natural numbers (for instance, an ID in a database) to different strings with the shortest length possible and an absolute zero possibility of collision. Eventually, I found a perfect answer for this, which involves several topics in number theory (prime number, totient function, Fermat little theory, elliptic curve, discrete logarithm, etc.). I will summarize the related topics in this post.
Terminology and notation
Prime number ( \(p \in \mathbb{P}\) ): numbers that can only be divided by itself. Non-prime integers are called composite.
Semiprime number: \(pq\) for \(p, q\in \mathbb{P}\)
Co-prime integers: these statements are equivalent
- \(gcd(a, b) = 1\) (\(gcd()\): greatest common divisor function)
- \(a\) and \(b\) are coprime, relatively prime, or, mutually prime
- \(a\) is prime to \(b\), or, \(a\) is coprime with \(b\), and versa vice
Modular arithmetic (congruences): \(a\) and \(b\) are congruent modulo \(n\) iff \(n\) is a divisor of their difference. We write:
\[ a \equiv b \pmod{n} \]
Euler's totient function \(\varphi(n)\): number of positive integers up to \(n\) that are coprime to \(n\). For \(p, q, p_i\in \mathbb{P}, k,k_i \in \mathbb{N}\), we have:
- \(\varphi(p) = p - 1\)
- \(\varphi(pq) = (p - 1)(q - 1)\)
- \(\varphi(p^k) = p^{k - 1}(p - 1)\)
- \(\varphi(p_1^{k_1}p_2^{k_2}\cdots) = p_1^{k_1 - 1}(p_1 - 1)p_2^{k_2 - 1}(p_2 - 1)\cdots\)
Ring of integer modulo \(n\):
\[\mathbb{Z}/n\mathbb{Z} = \{\bar{a}_n | a \in \mathbb{Z} \} = \{ \overline{0}_n, \overline{1}_n, \overline{2}_n, \ldots, \overline{n-1}_n \} \]
where \(\overline{a}\) is the congruence class (or, residue class) of \(a\) modulo \(n\): \(\overline{a} = \{a + kn | k \in \mathbb{Z}\}\)
Multiplicative group of integers modulo \(n\), or, primitive residue classes modulo \(n\): \((\mathbb{Z}/n\mathbb{Z})^\times = \{m | m \in \{0,\ldots, n - 1\}, gcd(n, m) = 1\}\). There is \(\left|(\mathbb{Z}/n\mathbb{Z})^\times\right| = \varphi(n)\)
Fermat's little theorem
\[a^p \equiv a \pmod{n} \forall p \in \mathbb{P}, \forall a \in \mathbb{N}\]
Proof: Wikipage lists various proofs. Below is one of them.
We simplifies the theorem by supposing \(1 \leq a \leq p - 1\)
We have:
\[ \begin{equation} ia \not\equiv ja \pmod{p} \forall i, j \in \{1, \ldots, p - 1\}, i \neq j \label{eq:fermat-diff} \end{equation} \]
Proof: if \(ia \equiv ja \pmod{p}\), then \((i - j) a \equiv 0 \pmod{p}\), then \(i - j \equiv 0 \pmod{p}\) (because \(gcd(a, p) = 1\)), then \(i = j\).
And we have:
\[ \begin{equation} ka\not\equiv 0 \pmod{p}\forall k \in \{1, \ldots, p - 1\} \label{eq:fermat-nonzero} \end{equation} \]
Because there are no more than \(p - 1\) different non-zero values regarding modulo \(p\), these \(p - 1\) values of \(ka\) must have the same value with \(1, 2, \ldots, p - 1\) regarding modulo \(p\). Hence:
\[\prod_{k=1}^{p-1}{ka} \equiv \prod_{i=1}^{p-1}i \pmod{p} \]
So that,
\[(p-1)! a^{p-1} \equiv (p-1)! \pmod{p}\]
Thus,
\[a^{p-1} \equiv 1 \pmod{p}\] (because \(gcd(i, p) = 1\forall i \in \{1, \ldots, p - 1\}\), all \(i\) in \((p - 1)!\) are cancelled in both side when taking modulo \(p\))
Euler's theorem
A generalization form of Fermat's little theorem.
\[ a^{\varphi(n)} \equiv 1 \pmod{n} \forall a: gcd(a, n) = 1 \]
Consequence: if we can factorize a number \(n\), we can easily calculate \(\varphi(n)\), and infer \(a^{-1}\equiv a^{\varphi(n) - 1} \pmod{n}\)
Corollary: if \(gcd(a, n) = 1\) and \(x \equiv y \pmod{n}\), then \(a^x \equiv a^y \pmod{n}\)
Fermat number
\[ F_n=2^{2^n} + 1 \]
Some facts about Fermat numbers:
- \(F_0 = 3, F_1 = 5, F_2 = 17, F_3 = 257, F_4 = 65,537\) are prime numbers
- \(F_4 = 65,537\) is the largest known Fermat prime
- Is \(F_n\) composite \(\forall n > 4\) is an open problem
- \(F_n \equiv 7 \pmod{10} \forall n > 1\). So, the last digit of Fermat numbers is always 7, except \(F_0, F_1\)
- \(gcd(F_i, F_j) = 1 \forall i \neq j \)
Mersenne Number
\[M_n = 2^n - 1\]
Mersenne numbers are represented by \(n\) digits of 1, and no digit 0, in binary format.
It is well-known that if \(M_n\in \mathbb{P}\), then \(n \in \mathbb{P}\).
Some known prime Mersenne numbers are: \(M_2 = 3, M_3 = 7, M_5 = 31, M_7 = 127, M_{13} = 8191, M_{17} = 131,071,\) \(M_{19} = 524,287, M_{31} = 2,147,483,647\)
More \(n\) values for Mersenne numbers to be prime: 61, 89, 107, 127, 521, 607, 1279, 2203, 2281, 3217, 4253, 4423, 9689, 9941, 11213, etc. You can check the exhausted list here
\(M_{43,112,609}\) is prime and has 12,978,189 digits
\(M_{136,279,841}\) is prime and has 41,024,320 digits. It was discovered on October 12, 2024, and is the latest, the largest known prime number at the time of this writing (October 22, 2024).
Wilson's theorem
\((n-1)!\equiv -1 \pmod{n}\) if and only if \(n \in \mathbb{P}\)
Moreover, \((n-1)! \equiv 0 \pmod{n}, \forall n \not\in \mathbb{P}, n \neq 4\)
Chinese remainder theorem
For pairwise coprime integers \(n_i (i\in\{1, \ldots, k\})\), and any integers \(a_i (i \in \{1, \ldots, k\})\), considering the equation of \(x\): \(x\equiv a_i \forall i\in \{1, \ldots, k\}\).
- There is at least one solution
- All solutions are congruent modulo \(N = \prod_{i=1}^{k}n_i\). In other words, the solution is unique in modulo \(N\)
Proof:
Suppose that there are \(x_1, x_2\) satifying the requirement: \(x_1 - x_2 \equiv 0 \pmod{n_i} \forall i \in \{1, \ldots, k\} \). Hence, \(x_1 - x_2 \equiv 0 \pmod{N}\), because \(n_i\) are pairwise coprime. The second statement is proven.
Each number \(x \in \{0, \ldots, N - 1\}\) is the solution of the equation with \(a_i = x \mod n_i\). So, there must be at least \(N\) different equations, associated with different \(a_i\) values, having different solutions modulo \(N\). However, because the solution does not change if we replace any \(a_i\) with their congruent modulo \(N\), there are at most \(N\) possible different combinations of \(a_i\) to make the equation different. This means each equation must occupy at least one solution, and that solution is one we need to show its existence.
Corollary, ring of integers modulo \(N\) and the direct product of the rings of integers modulo \(n_i\) are isomorphic
\[\mathbb{Z}/N\mathbb{Z}\cong \mathbb{Z}/n_1\mathbb{Z} \times \mathbb{Z}/n_2\mathbb{Z} \times \cdots \times \mathbb{Z}/n_k\mathbb{Z}\]
Consequently, for prime power factor of integer \(n = p_1^{a_1}p_2^{a_2}p_3^{a_3}\cdots\):
\[\mathbb{Z}/n\mathbb{Z}\cong \mathbb{Z}/p_1^{a_1}\mathbb{Z} \times \mathbb{Z}/p_2^{a_2}\mathbb{Z} \times \cdots\]
Discrete logarithm and Primitive root modulo
Primitive root modulo \(n\), or, generator of the multiplicative group of integers modulo \(n\), is a number \(g\) such that \(\{g^i \mod n|i\in\{0, 1, \ldots, \varphi(n) - 1\}\} = \{i | i \in \{0, 1, \ldots, n\}, gcd(i, n) = 1\}\).
Discrete logarithm of \(a\) to the base \(r\) modulo \(n\) is \(k := ind_r a \pmod{n}\), such that \(r^k \equiv a \pmod{n}\)
If the generator \(g\) exists, for every number \(a\) coprime to \(n\), there exists \(ind_g a \pmod{n}\)
For integer \(b\) coprime to \(n\), multiplicative order of \(b\) modulo \(n\) is \(ord_n(b) := \min(\{ind_b 1 \pmod{n}\} \cap \mathbb{Z}^+) \). If \(b\) is a primitive root, \(ord_n(b) = \varphi(n)\)
There are exactly \(0\) or \(\varphi(\varphi(n))\) primitive roots modulo \(n\)
The primitive root theorem: primitive root exists if and only if \(n\) is \(1, 2, 4, p^k, 2p^k\) where \(p\) is odd prime, and, \(k > 0\)
From Chinese remainder theorem, for prime power factor of integer \(n = p_1^{k_1}p_2^{k_2}p_3^{k_3}\cdots\), we imply:
\[(\mathbb{Z}/n\mathbb{Z})^\times\cong (\mathbb{Z}/p_1^{a_1}\mathbb{Z})^\times \times (\mathbb{Z}/p_2^{a_2}\mathbb{Z})^\times \times \cdots\]
Diffie-Hellman key exchange
- Alice and Bob publicly choose a prime number \(p\) and \(g\) which is a primitive root modulo \(p\). Usually, \(p\) and \(g\) are pre-calculated and defined by protocols to be chosen. For example: see 6 Oakley Groups
- Alice choose a secret integer \(a\), sends Bob \(A = g^a \mod p\)
- Bob chooses a secret integer \(b\), sends Alice \(B = g^b \mod p\)
- Alice computes \(s = B^a \mod p\)
- Alice computes \(s = A^b \mod p\)
- Alice and Bob now share a secret \(s\)
In general, multiplicative group of integers modulo \(p\), \((\mathbb{Z}/p\mathbb{Z})^\times = \{g^i|i\in\{0, 1, \ldots, \varphi(p)-1\}\}\) can be replaced by any cyclic group \(G\) of order \(n\) with generator \(g\). For instance, Elliptic Curve Diffie-Hellman (ECDH) is a variant that represents \(A, B\) as points on an elliptic curve instead of as integers modulo \(p\)
Rivest–Shamir–Adleman public-key cryptosystem
Key generation
- Choose 2 large primes \(p, q\). Get \(n = pq\)
- Compute \(\varphi(n) = (p - 1)(q - 1)\). \(n\) is public, \(\varphi(n)\) cannnot be easily computed without knowing \(p, q\)
- Choose \(e\) coprime to \(\varphi(n)\). Typical value is \(e = F_4 = 2^{2^4} + 1 = 65,537\)
- Calculate \(d = e^{-1} \mod \varphi(n)\) (using Extended Euclidean Algorithm)
- Publish \((n, e)\) as public key, and store \((n, d)\) as private key
Encryption
- Convert the message to integer \(m: 0 \leq m \lt n \)
- Encode \(m\) using public key to \(c = m^e \mod n\)
Decryption
- Decode \(c\) by \(m = c^d \mod n\)
Lehmer random number generator
Other names: Park–Miller random number generator, multiplicative linear congruential generator (MLCG), multiplicative congruential generator (MCG)
\[X_{k+1}=aX_k \mod m\]
where \(m\) is a prime number or power of a prime number, \(a\) is a primitive modulo of \(m\), and, the seed \(X_0\) is coprime to \(m\)
Some well-known choices of \(m\) and \(a\) are: \((m = M_{31} = 2^{31} - 1 = 2,147,483,647, a = 7^5 = 16,807)\), \((m = M_{31}, a = 48,271)\) (C++11's minstd_rand
function uses this set), \((m = F_4 = 65,537, a = 75)\), \((m = 2^{48}, a = 44,485,709,377,909)\) (this case, \(m\) is not prime number, the period is shorter, and lower bit has much shorter period because higher-order bits do not affect lower-order bits).
Back to the original problem at the beginning of this post, we can encode the integer id \(k\) as \(X_k\), the \(\varphi(m)\) first values will be different. For example, if we choose \((m, a) = (M_{31}, 7^5)\), encode the output using alphabet characters including both upper and lower case, dropping visually confusing characters (l, o, I, O, q, g. Additionally if numbers are considered: 0, 1, 9), leading to 46 characters in total. We need \(\left\lceil\frac{\log(\varphi(m))}{\log(46)}\right\rceil = 6\) characters to encode 2,147,483,646 different IDs.
We can choose \((m; a) = (65,537; 75)\), divide the id by \(m\), encode the quotient directly, and encode the remainder using multiplicative congruential generator described above. As a result, we need 3 characters for the first 65,536 ids, \(3+k\) characters for next \(44^k * 65,536\) ids.