2 unstable releases

0.2.0	Oct 3, 2024
0.1.4	Sep 11, 2024

#443 in Math

358 downloads per month

MIT/Apache and LGPL-3.0+

24KB
360 lines

The Collatz Conjecture

The Collatz Conjecture is a famous, and deceptively simple, unsolved problem in mathematics. The algorithm it concerns is simplicity itself; if a number is odd, multiply by 3 and add 1, otherwise divide by 2. The conjecture states that for any starting value the algorithm will eventually reach 1. The problem has remained unsolved since 1937.

There is a heuristic argument that suggests why the conjecture should be true. The Collatz algorithm can be reformulated as follows. Start with an odd number. Multiply by 3 and add 1. Then divide by two until the number is odd again. Repeat. From this perspective it is obvious that it is sufficient to only consider odd starting numbers since even starting values will always be divided until they are odd. After applying the 3n+1 step the output will always be even. Thus starting with a value n, half the time the next odd number will be (3n + 1)/2, half of the remaining time (25%) it will be (3n + 1)/4, and half of the times left (12.5%) it will be (3n + 1)/8. Taking the expected value we see that, on average, the next odd number should be 3/4 of it's predecessor, and thus the sequence should at least converge to a cycle. In light of this, and the numerical evidence, we can expect 1 -> 4 -> 2 -> 1 (or 1 -> 1 excluding even numbers) to be the only cycle.

The conjecture has been confirmed up to 10²².

Extending the Collatz Conjecture

In order to get convergence to a cycle, we need to preserve the heuristic argument above. A naive generalization would simply replace 3n + 1 with 5n + 1. This shouldn't work. The heuristic argument suggests that, on average, each odd number will be 5/4 times it's predecessor. Starting with n = 7 produces a sequence that at least appears to go to infiniity. (Even this has not been proven; nothing related to the Collatz Conjecture is easy.) If we could replace the heuristic factor of 5/4 with something less than one, such as 5/8, we should expect convergence. This can be achieved by ensuring that each odd number is followed by a multiple of 4. In cases where 5n + 1 is only divisible by 2 we replace it with 5n + 3.

The new algorithm is then as follows. Start with an odd number n. Multiply by 5. Add 1 or 3 to ensure that the result is a multiple of 4. Then divide by 2 until the result is odd. Repeat.

The natural generalization of the Collatz Conjecture would be that this sequence is eventually one. It only takes n = 23 to show that this is false. Starting with n = 23 (and excluding even numbers,) we get the sequence 23 -> 29 -> 37 -> 47 -> 59 -> 37. This shows that some sequences reach a non-trivial cycle, namely 37 -> 47 -> 59 -> 37. Curiously, there appear to be only two other such cycles for this algorithm: 31 -> 39 -> 49 and 61 -> 77 -> 97. Even more curiously, these four cycles (including 1 -> 1) occur with relatively constant frequencies.

Cycle	Frequency
1 -> 1	62.7%
31 -> 39 -> 49	13.6%
37 -> 47 -> 59	19.7%
61 -> 77 -> 97	4.0 %

These percentages, or something very close to then, hold true for all sequences starting with odd numbers less than 10000, all odd numbers less than 1 million, or simply all odd numbers in any (sufficiently large) interval.

I have verified this for all sequences starting with odd numbers less than 1 billion. Most importantly, every sequence eventually reaches some finite cycle. As predicted by the heuristic, no sequence has gone to infinity. As noted before the frequency of the four cycles remains consistent at any scale. It is also worth noting that the non-trivial cycles all have the same length. The fact that the non-trivial cycles all start with prime numbers is coincidence.

In what follows, a cycle will often be denoted by it's smallest number

7n + (1, 3)

It's trivial to replace 5 with 7. Starting with an odd number, multiply by 7 and add 1 or 3, whichever yields a multiple of 4. Divide by two until the number is odd again. Once again, the heuristic argument says the sequence should, on average, decrease, this time by a factor of 7/8. The same numerical analysis shows that every sequence reaches one of two cycles, both trivial if even numbers are ignored. They are 1 -> 1 and 3 -> 3. Once again these cycles occur at a fixed ratio, and all this holds true for any odd number less than 1 billion.

Cycle	Frequency
1 -> 1	86.2%
3 -> 3	13.8%

A Generalized Collatz Algorithm

It should now be obvious that if 7 is replaced with the 9 but the +1 or +3 rule is retained then the heuristic argument suggests that the resulting sequence would increase, on average, by a rate of 9/8 and thus go off to infinity. The key fact is that by going from 7 to 9 we jumped over a power of 2. When we do that we need to change the addition rule to ensure that the result is divisible by a higher power of 2. Replacing the +1 or +3 rule with a +1, +3, +5, or +7 (whichever yields a multiple of 8) rule then any sequence should, on average, decrease by a factor of 9/16 and thus reach a finite cycle. The same rule should work if we replace the multiplicative factor of 9 by 11, 13, or 15 but once we get to 17 the additive rule will have to get more complicated.

We now have a way to generalize the Collatz Algorithm. Any odd number a lies between two powers of 2, which we denote p and 2p. Starting with any odd number n, we multiply by a and then add the smallest positive number such that the result is divisible by p and then divide by 2 until the number is odd again. If we repeat these steps the heuristic argument says that the next odd number should be, on average, a/(2p) times it's predecessor. Since a/(2p) < 1, we should expect each sequence, regardless of starting value, reach some finite cycle. This last statement is the Generalized Collatz Conjecture.

This is one case where a single step of the generalized Collatz algorithm is best understood by reading the code

def collatz_step(n, a, p):
    # Multiply by a
    n *= a
    
    # Add the smallest value so that n is divisible by p
    n += p - n & (p - 1)
    
    # Divide by 2 until n is odd
    while not (n & 1):
        n >>= 1
    
    return n

Of course, the code in src/collatz.rs is a bit more complicated. The main reason for that is it uses three integer types. The conjecture has been verified for values of $a$ up to 8191 (one less than 2¹³) and odd number numbers up to 10 million (in some cases 100 million) but some sequences, and even some cycles, get very large. In the interest of keeping use of the slow, but unbounded, rug::Integer class to a minimum I constructed a system that switches between u64, u128, and Integer as needed.

Running the Code

If you've made it this far, you probably want to know how the code works. You might even be wondering why I wrote it in Rust. The short answer to the second question is speed. Rust almost as fast as C, sometimes faster than C++, and doesn't have null pointer errors. Python just can't compete.

I'll assume you have Rust installed.

Download the source code from Github. To compile, just run

cargo build --release

The program checks the Generalized Collatz Conjecture for all odd numbers up to a specified value, the -n flag and for values of $a$ between a --start and --end value inclusive. Output can take one of two forms, the --write-cycle flag outputs all cycles found for each value of $a$, with one file for each $a$. This file includes cycle length and the frequency of each cycle. The --write-table flag outputs the lowest number of the cycle for each starting value and value of $a$. Be careful with the --table flag as it is quite easy to produce a lot of large files if -n is large and there is a big gap between --start and --end. Cycle files are stored in a folder called cycles and tables of cycle mins are stored in a folder called tables. In cases, such as the classic Collatz $a=3$ where there is only one known cycle, --table does not produce any output.

Computations are done for multiple values of $a$ at one time, depending on the number of available cores.

As an example, to find all cycles with starting values less than 1 million for values of $a$ between 3 and 15 run the following

cargo run --release -- -n 1000000 --start 3 --end 15 --write-cycle

Command	Definition
-n	Run algorithm for all starting values less than this number
-s --start	Lowest value of $a$.
-e --end	Highest value of $a$.
--write-cycle	Outputs a csv of cycles for each value of $a$.
--write-table	Outputs a csv of minimum cycle values for each starting value and each value of $a$.

Sample output of a cycle file, for $a=5$ and -n 1000000000.

min	count	length	cycle
1	314754837	1	1
31	66215034	3	31 -> 39 -> 49
37	99485169	3	37 -> 47 -> 59
61	19544960	3	61 -> 77 -> 97

Sample output of a file in the table directory for $a=7$ and -n 20.

n	cycle
1	1
3	3
5	1
7	1
9	1
11	1
13	1
15	3
17	3
19	3

Dependencies

~13MB
~216K SLoC