Arthur's Blog

Friday, 23 December 2022

Factorisation

After looking at some dismal online factorisation algorithms, I wondered whether I could do a better job, so I wrote a Javascript version, which is mainly a port of my Python factorisation algorithm, with a port and modification of a Python ECM implementation (acknowledgement in the website).

The algorithm implements Miller-Rabin primality testing, and factorisation using trial division by small primes, Pollard Rho algorithm and Elliptic Curve Method (ECM) using Montgomery curves.

The factorisation website is here.

Wednesday, 13 May 2020

Pi Calculator

The Covid19 lockdown has given me the opportunity to do a few projects, one of which is an interactive $ \pi $ calculator, which I have placed in a separate tab on this blog.

Here is a link to the calculator.

Friday, 15 June 2018

Sunday Times Teaser 2907, Combinatorial Cards

Sunday Times Teaser 2907 asks how many cards there would be in a set with eight images per card, each image appearing on exactly eight cards and the total number of images being the same as the number of cards.

Astute readers will recognise this as an example of a finite projective plane, and may also recognise the set of cards as being similar to the game Doppel or Spot It!

Jim Randell produced a Python module to generate a set of cards, and a similar Python program by Joel Grus can be found here

I thought I would try to produce a Python version that allows sets of cards to be produced based on a projective plane associated with a finite field of the form GF(p^n) for arbitrary prime p an positive integer n.

The tricky bit if being able to do multiplication in the finite field, which requires an irreducible polynomial of degree n over the field. My code achieves this for values of n up to 3, and for a few other combinations of p and n that have pre-calculated irreducibles.

The code can be found here. It consists of

a general purpose finite field module (adapted from the one I created for Sunday Times Teaser 777 )
a module to generate sets of cards based on the projective plane of GF(p^n)

Tuesday, 9 May 2017

Three Pandigitals

A problem in Contest Center asks:

Find the smallest whole number N=ABC such that A, B and C each contain all of the digits 0 to 9 at least once, and N contains all of the digits 0 to 9 at least three times.

Here is a description of my code to solve the problem:

Any solution, N, is at least a 30 digit number, and therefore too large to be represented by 64 bits, but by using a C++ 128 bit unsigned integer class, all numbers up to 38 decimal digits can be represented, which is hopefully large enough to search for solutions.

I used the 128 bit C++ class https://github.com/calccrypto/uint128_t developed by Jason Lee, adding some performance and functional improvements to help solve the three pandigitals puzzle, notably:

Add a function decimal_string to construct the decimal representation of a 128 bit number more quickly.
Using the intrinsic _umul128 to improve performance of 128 bit number multiplication

The structure of the solution is as follows.

Pre-calculate P_{10 ,} a sorted list of the 10-digit pandigitals from 1023456789 to 9876543210

Define M to be the smallest triple pandigital, $ M = 100011222333444555666777888999 $

Define $\hat{N}$ to be the best solution found so far. Initially set $\hat{N} = 999888777666555444333222111000 $ (so assuming there is a 30 digit solution)

for A in P₁₀
    for B in P₁₀in the range A to $ \left \lfloor \sqrt{\frac{\hat{N}}{A}} \right \rfloor $
        for C in the range $ \left \lceil \frac{M}{AB} \right \rceil $ to $ \left \lfloor \frac{\hat{N}}{AB}\right \rfloor $
            if C is pandigital and ABC is triple-pandigital
                set $ \hat{N} = ABC $

This has been implemented as a Visual Studio C++ solution Three Pandigitals.zip

The code contains a function to quickly determine whether a value of C is pandigital, and a function added to the uint128_t class to more quickly convert a 128 bit number to a decimal string, used in the test to determine whether ABC is a triple pandigital.

As the search takes several hours, the main program can be run with two arguments: the starting value of A and the best value $\hat{N}$ found so far, allowing the program to be stopped and then restarted where it left off.

Running the program without any input arguments, this value of $\hat{N}$

    $\hat{N} = 1023456789 * 7694082153 * 12700546398 = 100011222449737853847658995366 $

is found almost immediately, vastly reducing the range of values of C that need to be subsequently searched. As smaller values of $\hat{N}$ are found, the range of C values is reduced further.

The program avoids performing 128 bit multiplication as much as possible, and avoids 128 bit division completely by pre-calculating the inverse of all 10 digit pandigitals, stored as doubles, which are then used to calculate a double that approximates $    \frac{M}{AB} $. A uint128_t cast of this approximation is then refined by addition or subtraction.

Friday, 17 February 2017

IBM Ponder This, January 2017

The IBM Ponder This puzzle for January 2017 asked for a rational solution for x^y=y^x, where the last eight digits of x are different from each other, and for a bonus asked for a solution where the last eight digits of x are a permutation of 1,2,3,4,5,6,7,8.

It is possible to do better than this, and construct a solution where the last nine digits of x are are a permutation of 1,2,3,4,5,6,7,8,9 , and a solution where the last ten digits of x are are a permutation of 0,1,2,3,4,5,6,7,8,9.

Original Problem and Bonus

Goldbach and Euler proved that a set of 2 rationals {x,y} satisfies x^y=y^xiff $ \{x,y\} = \{ (\frac{n+1}{n})^n , (\frac{n+1}{n})^{n+1} \} $ for integer n. [1], [2]

If these numbers are to have last digits, n must be of the form $ n = 2^t 5^f $

$ \frac{n+1}{n}$ has $ d = max(t,f) $ decimal places.

$ \frac{(n+1)10^d}{n} $ is an integer whose last digits are the last decimal digits of $ \frac{n+1}{n}$

The last $D$ digits of $x,y$ are

$ (\frac{(n+1)10^d}{n})^{n+k} $ mod $ 10^D $, for $k = 0,1 $

Which can be calculated as

$ ((n+1)2^{d-t}5^{d-f})^{n+k} $ mod $ 10^D$, for $k = 0,1 $

$ (10^d +2^{d-t}5^{d-f})^{n+k} $ mod $ 10^D$, for $k = 0,1 $ ...... (1)

For modest values of n, it is easy to calculate these residues using modular exponentiation, for example in C or Python. This Python program searches for solutions to both problems.

The first problem has a solution for $n=2^4 \cdot 5^1 =80$, $x =( \frac{81}{80})^{80} $

The bonus problem has a solution for $n= 5^5 = 3125$, $x =( \frac{3126}{3125})^{3126} $

Extending the Problems

Now the more interesting problems: find a solution where the last nine digits of x are are a permutation of 1,2,3,4,5,6,7,8,9 , and a solution where the last ten digits of x are are a permutation of 0,1,2,3,4,5,6,7,8,9.

Here we need to examine larger values of $n$, so expression (1) becomes impractical to use and we have to find a better way of determining the last digits of $ (\frac{(n+1)10^d}{n})^{n+k} $

If $d \ge D$, then (1) can be reduced to

$ (2^{d-t}5^{d-f})^{n+k} $ mod $ 10^D$, for $k = 0,1 $

There are two cases to consider:

$f>t$

$ (2^{d-t}5^{d-f})^{n+k} $ mod $ 10^D = ( 2^{f-t})^{n+k} $ mod $ 10^D = 2^{(f-t)(n+k)} $ mod $ 10^D $

Now [3] shows that powers of 2 mod $10^D$ cycle with a period of $P = 4 \cdot 5^{D-1}$, starting at $2^D$

so if $ (f-t)(n+k) = (f-t)(2^t 5^f+k) \ge D $,

$2^{(f-t)(n+k)} $ mod $ 10^D $ = $ 2^{ D + ((f-t)(2^t 5^f+k) - D) \: mod \: P } $ mod $ 10^D $ ..... (2)

$t >f$

$ (2^{d-t}5^{d-f})^{n+k} $ mod $ 10^D = ( 5^{t-f})^{n+k} $ mod $ 10^D = 5^{(t-f)(n+k)} $ mod $ 10^D $

[4] shows that powers of 5 mod $10^D$ cycle with a period of $P=2^{D-2}$, starting at $5^D$

so if $ (t-f)(n+k) = (t-f)(2^t 5^f+k) \ge D $,

$5^{(t-f)(n+k)} $ mod $ 10^D $ = $ 5^{ D + ((t-f)(2^t 5^f+k) - D) \: mod \: P } $ mod $ 10^D $ ..... (3)

The residues in (2) and (3) can be calculated using modular exponentiation without the need to handle excessively large numbers. This Python program searches for solutions to both problems, finding the following solutions:

For $n = 5^{267}$, the last ten digits of $( \frac{n+1}{n})^{n+1} $ are 3628975104

For $n = 2^2 \cdot 5^{957}$, the last nine digits of $( \frac{n+1}{n})^{n+1} $ are 247315968

[1] Cut The Knot, Rational Solutions to x^y = y^x

[2] Sved, Marta. “On the Rational Solutions of x^y = y^x.” Mathematics Magazine, vol. 63, no. 1, 1990, pp. 30–33., http://www.jstor.org/stable/2691508.

[3] Exploring Binary, Cycle Length of Powers of Two Mod Powers of Ten, Rick Regan, November 6th, 2009

[4] Exploring Binary, Cycle Length of Powers of Five Mod Powers of Ten, Rick Regan, December 22nd, 2009

Sunday, 10 May 2015

Cheryl's Birthday

The puzzle, originally set in the 2015 Singapore and Asian Schools Math Olympiad, know as Cheryl's Birthday, has generated a lot of interest over the past few weeks.

A discussion of the puzzle here queried the meaning of the term "day" in the statement "Cheryl then tells Albert and Bernard separtely the month and day of her birthday respectively".

The generlly accepted interpretation is that it is the day of the month, but I wondered whether the puzzle could be solved if "day" were interpreted as the day of the week, as suggested in the discussion.

The solution, assuming the "day of the month" interpretation, looks like this:

The days of the week associated with each date in 2015 are:

Now suppose Albert and Bernard are told all these dates, Albert is told the month of Cheryl's birthday and Bernard is told the day of the week. A solution can still be derived, as follows:

A solution with this structure would work for any other year, as the days of the week would all be offset by the same amount.

Tuesday, 27 January 2015

Factoring numbers in Python and C++

While trying to solve the "Squares on a Cube" puzzle on Contest Center, I became sidetracked into writing an efficient program to produce the prime factors of a number, an exercise that most programmers will have attempted at some time.

The hybrid Python / C++ code presented here combines well known techniques, Miller Rabin and Brent's variation of Pollard Rho, together with a modern fast technique for modular reduction developed by Rutten and Eekelen.

Python implementation

After a bit of experimenting, I settled on this mixture for a Python implementation:

Lookup tables for numbers below 10⁶
Trial division of primes below 300
Miller-Rabin primality test (deterministic up to the published limit)
Brent's variant of the Pollard Rho algorithm, by default running for up to 1 second
Elliptic Curve Method (ECM) if the Brent algorithm fails to find a factor within its time limit.

C++ implementation

While the Python version is reasonably fast, a C++ version is much faster. My C++ code factors numbers below 2⁶⁴, where Brent Pollard Rho can be used without having to use ECM or Multiple Polynomial Quadratic Sieve.

The performance of Brent Pollard Rho and Miller Rabin is ultimately dominated by modular multiplication, so any improvement of modular multiplication time is reflected in the overall performance of the factorisation algorithm. Here are my attempts to speed up modular multiplication:

I. Shift and add

To calculate $a \times b$ mod $m$ I initially used an intricate, and relatively slow, shift and add method. Several implementations of this can be found on the internet. The version I used was stolen from Craig McQueen's post on stackexchange. This version is valid for any $ m < 2^{64} $

Function modmulSA implements this technique.

II, Repeated reduction of most significant 64 bits

My first attempt to speed up modular multiplication, valid for a modulus $m < 2^{63}$, uses the following techniques:

a) Division by multiplication
If a modulus of the same denominator needs to be obtained many times, as in the Brent algorithm and the Miller Rabin test, an efficient technique is to construct an inverse $M=2^k/m$ of the denominator $m$ once (M is often termed a "magic number"), then perform division by $m$ by multiplying the numerator by M and bit shifting to divide by $2^k$. Of course it's a bit more complicated than that, and is described comprehensively by Henry S. Warren Jr in Hackers Delight, (Chapter 10, Integer Division by Constants), and explained in a readable form by Ridiculous Fish in Labor of Division (Episode I).

C++ functions magicu2 and magicdiv (stolen from Hackers Delight) and magicmod implement these techniques.

b) Modular multiplication

Reduce a and b modulo m (using the division by multiplication technique)

Construct the product $c = a \times b$. If $a$ and $b$ are 64 bit numbers, $c$ is a 128 bit number. I used the MS VC++ intrinsic function for this calculation, but if this is not available, Hackers Delight function mulul64 can be used.

Repeatedly find the most significant 64 bits of $c$ and reduce them modulo $m$ (again using division by multiplication), until c is reduced to a number less than $m$

Function modmulRR implements this technique.

III, Assembler

On 64 bit Intel processors, the multiply instruction multiplies two 64 bit numbers and stores the 128 bit result in two registers, rdx:rax. The divide instruction divides a 128 bit number stored in rdx:rax by a 64 bit number and stores the 64 bit quotient in rax and the 64 bit remainder in rdx.

So these instruction are ideal for modular multiplication. However, the intrinsic function to perform division of a 128 bit number is not available in MS Visual Studio, so a few lines of assembler (mod64.asm) have to be written.

The assembler version has the advantage of being valid for a modulus up to $2^{64}-1$, and is noticeably faster than repeated reduction, and of course much faster than the shift and add method.

Function modmulAS is the C interface to this function.

IV. Rutten-Eekelen

An even faster method of modular multiplication than assembler, for a modulus $m<2^{63}$, uses the algorithm described in Efficient and formally proven reduction of large integers by small moduli by Luc M. W. J. Rutten and Marko C. J. D. van Eekelen. This method requires the pre-calculation of a similar, but slightly different, magic number to the one described above, and then makes use of this number in an intricate algorithm whose correctness has been verified using a computer generated proof.

Function modmulRE implements this function.

Performance

This graph shows the average time to factor numbers in the range $10^n$ to $10^n+10000$ (giving a representative mix of easy and hard factorisations) for all my C++ versions, and for reference, a Python implementation of Brent Pollard and Miller Rabin.

The graph shows that the shift and add version has performance no better than the Python version, and that the Rutten-Eekelen version beats the Assembler version, for numbers below $2^{63}$

C++ / Python integration

The ctypes module integrates Python with a DLL of the C++ implementation. Calling a DLL from Python incurs an overhead of about 1.5µs, so it is faster to use the Python implementation for small numbers that are factored using a lookup table. For larger numbers, the overhead soon becomes negligible compared to the time to factorise the number.

Rutten-Eekelen has the best performance for a modulus $m<2^{63}$, even beating the assembler version. For a modulus $2^{63} \le m \lt 2^{64}$ the assembler version is fastest, so the final C++ version called from Python uses a hybrid of these two methods.

To verify that the Brent Pollard Rho algorithm copes with difficult cases, this graph shows the average and worst case times to factorise numbers within $ \pm 5000$ of various baselines.

Note the jump in worst case factorisation times for numbers $ \ge 2^{63}$, where the algorithm switches from Rutten-Eekelen to Assembler.

The worst cases are listed here.

Source Code

All the source code is located here, except for the optional ECM algorithm, for which I used the Sourceforge implementation pyecm.py.

A stand-alone Python module, factors.py can be run without the C++ DLL or the ECM module being present.
This module has the following principal functions:

Description	Example call	Return value
all_factors(n) returns a list of all the factors of n The function chooses whether to use the Python or C++ version depending on the size of n.	all_factors(12)	[1,2,3,4,6,12]
totient(n) returns the Euler totient function	totient(30)	8
factorise(n, maxBrentTime=1.0) returns a list of the prime factors of n maxBrentTime determines how many seconds the Brent algorithm should run before reverting to ECM (if available). The function chooses whether to use the Python or C++ version depending on the size of n.	factorise(12)	[2,2,3]

A DLL, FactorsDLL.dll, built on Windows 7 64 bit, is included, together with a Python module factorsDLL.py to call the DLL from Python. I'm not sure whether the DLL will work on other machines. In case it doesn't, the source code to build the DLL is also included.

Implementation notes

The Miller Rabin test can occasionally derive a factor of a composite number, when testing witness $a$ for the primality of $n$ encouters the situation

$a^{2^rd} \ne \pm 1$ mod $n$ and $a^{2^{r+1}d} =1$ mod $n$

In this situation $gcd(a^{2^rd}-1,n)$ produces a factor of $n$. I have included this test, even though the condition rarely occurs in practice.

The Brent Pollard Rho algorithm occasionally fails to find a factor of a composite, so I simply run the algorithm again. The random selection of parameters seems to ensure that the algorithm does not have to be run too many times to obtain a factor.

Postscript

Even though the program vastly reduced the time to factorise a number, I still haven't solved the Contest Center problem that motivated me to write it.

Pages