## Wednesday, 13 May 2020

### Pi Calculator

The Covid19 lockdown has given me the opportunity to do a few projects, one of which is an interactive  $\pi$ calculator, which I have placed in a separate tab on this blog.

Here is a link to the calculator.

## Friday, 15 June 2018

### Sunday Times Teaser 2907, Combinatorial Cards

Sunday Times Teaser 2907 asks how many cards there would be in a set with eight images per card, each image appearing on exactly eight cards and the total number of images being the same as the number of cards.

Astute readers will recognise this as an example of a finite projective plane, and may also recognise the set of cards as being similar to the game Doppel or Spot It!

Jim Randell produced a Python module to generate a set of cards, and a similar Python program by Joel Grus can be found here

I thought I would try to produce a Python version that allows sets of cards to be produced based on a projective plane associated with a finite field of the form GF(p^n) for arbitrary prime p an positive integer n.

The tricky bit if being able to do multiplication in the finite field, which requires an irreducible polynomial of degree n over the field. My code achieves this for values of n up to 3, and for a few other combinations of p and n that have pre-calculated irreducibles.

The code can be found here. It consists of
• a general purpose finite field module (adapted from the one I created for Sunday Times Teaser 777 )
• a module to generate sets of cards based on the projective plane of GF(p^n)

## Tuesday, 9 May 2017

### Three Pandigitals

A problem in Contest Center asks:

Find the smallest whole number N=ABC such that A, B and C each contain all of the digits 0 to 9 at least once,
and N contains all of the digits 0 to 9 at least three times.

Here is a description of my code to solve the problem:

Any solution, N, is at least a 30 digit number, and therefore too large to be represented by 64 bits, but by using a C++ 128 bit unsigned integer class, all numbers up to 38 decimal digits can be represented, which is hopefully large enough to search for solutions.

I used the 128 bit C++ class https://github.com/calccrypto/uint128_t developed by Jason Lee, adding some performance and functional improvements to help solve the three pandigitals puzzle, notably:
• Add  a function decimal_string to construct the decimal representation of a 128 bit number more quickly.
• Using the intrinsic _umul128 to improve performance of 128 bit number multiplication

The structure of the solution is as follows.

Pre-calculate P10 ,  a sorted list of the 10-digit pandigitals from 1023456789 to 9876543210

Define M to be the smallest triple pandigital, $M = 100011222333444555666777888999$

Define $\hat{N}$ to be the best solution found so far. Initially set $\hat{N} = 999888777666555444333222111000$ (so assuming there is a 30 digit solution)

for A in P10
for B in P10  in the range A to $\left \lfloor \sqrt{\frac{\hat{N}}{A}} \right \rfloor$
for C in the range $\left \lceil \frac{M}{AB} \right \rceil$ to  $\left \lfloor \frac{\hat{N}}{AB}\right \rfloor$
if C is pandigital and ABC is triple-pandigital
set $\hat{N} = ABC$

This has been implemented as a Visual Studio C++ solution Three Pandigitals.zip

The code contains a function to quickly determine whether a value of C is pandigital, and a function added to the uint128_t class to more quickly convert a 128 bit number to a decimal string, used in the test to determine whether ABC is a triple pandigital.

As the search takes several hours, the main program can be run with two arguments: the starting value of A and the best value $\hat{N}$ found so far, allowing the program to be stopped and then restarted where it left off.

Running the program without any input arguments, this value of $\hat{N}$

$\hat{N} = 1023456789 * 7694082153 * 12700546398 = 100011222449737853847658995366$

is found almost immediately, vastly reducing the range of values of C that need to be subsequently searched. As smaller values of $\hat{N}$ are found, the range of C values is reduced further.

The program avoids performing 128 bit multiplication as much as possible, and avoids 128 bit division completely by pre-calculating the inverse of all 10 digit pandigitals, stored as doubles, which are then used to calculate a double that approximates $\frac{M}{AB}$. A uint128_t cast of this approximation is then refined by addition or subtraction.

## Friday, 17 February 2017

### IBM Ponder This, January 2017

The IBM Ponder This puzzle for January 2017 asked for a rational solution for xy=yx, where the last eight digits of x are different from each other, and for a bonus asked for a solution where the last eight digits of x are a permutation of 1,2,3,4,5,6,7,8.

It is possible to do better than this, and construct a solution where the last nine digits of x are are a permutation of 1,2,3,4,5,6,7,8,9 , and a solution where the last ten digits of x are are a permutation of  0,1,2,3,4,5,6,7,8,9.

### Original Problem and Bonus

Goldbach and Euler proved that a set of 2 rationals {x,y} satisfies xy=yx    iff   $\{x,y\} = \{ (\frac{n+1}{n})^n , (\frac{n+1}{n})^{n+1} \}$  for integer n. [1], [2]

If these numbers are to have last digits, n must be of the form $n = 2^t 5^f$

$\frac{n+1}{n}$ has $d = max(t,f)$ decimal places.

$\frac{(n+1)10^d}{n}$ is an integer whose last digits are the last decimal digits of $\frac{n+1}{n}$

The last $D$ digits of $x,y$ are
$(\frac{(n+1)10^d}{n})^{n+k}$ mod $10^D$,  for $k = 0,1$
Which can be calculated as
$((n+1)2^{d-t}5^{d-f})^{n+k}$ mod $10^D$,  for $k = 0,1$
or
$(10^d +2^{d-t}5^{d-f})^{n+k}$ mod $10^D$,  for $k = 0,1$                            ...... (1)
For modest values of n, it is easy to calculate these residues using modular exponentiation, for example in C or Python.  This Python program searches for solutions to both problems.

The first problem has a solution for $n=2^4 \cdot 5^1 =80$, $x =( \frac{81}{80})^{80}$

The bonus problem has a solution for $n= 5^5 = 3125$, $x =( \frac{3126}{3125})^{3126}$

### Extending the Problems

Now the more interesting problems: find a solution where the last nine digits of x are are a permutation of 1,2,3,4,5,6,7,8,9 , and a solution where the last ten digits of x are are a permutation of  0,1,2,3,4,5,6,7,8,9.

Here we need to examine larger values of $n$,  so expression (1) becomes impractical to use and we have to find a better way of determining the last digits of $(\frac{(n+1)10^d}{n})^{n+k}$

If $d \ge D$, then (1) can be reduced to
$(2^{d-t}5^{d-f})^{n+k}$ mod $10^D$,  for $k = 0,1$

There are two cases to consider:

### $f>t$

$(2^{d-t}5^{d-f})^{n+k}$ mod $10^D = ( 2^{f-t})^{n+k}$ mod $10^D = 2^{(f-t)(n+k)}$ mod $10^D$

Now [3] shows that powers of 2 mod $10^D$ cycle with a period of $P = 4 \cdot 5^{D-1}$, starting at $2^D$
so if $(f-t)(n+k) = (f-t)(2^t 5^f+k) \ge D$,
$2^{(f-t)(n+k)}$ mod $10^D$ =  $2^{ D + ((f-t)(2^t 5^f+k) - D) \: mod \: P }$ mod $10^D$                ..... (2)

### $t >f$

$(2^{d-t}5^{d-f})^{n+k}$ mod $10^D = ( 5^{t-f})^{n+k}$ mod $10^D = 5^{(t-f)(n+k)}$ mod $10^D$

[4] shows that powers of 5 mod $10^D$ cycle with a period of $P=2^{D-2}$, starting at $5^D$

so if $(t-f)(n+k) = (t-f)(2^t 5^f+k) \ge D$,

$5^{(t-f)(n+k)}$ mod $10^D$ =  $5^{ D + ((t-f)(2^t 5^f+k) - D) \: mod \: P }$ mod $10^D$                ..... (3)

The residues in (2) and (3) can be calculated using modular exponentiation without the need to handle excessively large numbers. This Python program searches for solutions to both problems, finding the following solutions:

For $n = 5^{267}$, the last ten digits of $( \frac{n+1}{n})^{n+1}$ are 3628975104

For $n = 2^2 \cdot 5^{957}$, the last nine digits of $( \frac{n+1}{n})^{n+1}$ are 247315968

[1] Cut The Knot, Rational Solutions to x^y = y^x

[2] Sved, Marta. “On the Rational Solutions of xy = yx.” Mathematics Magazine, vol. 63, no. 1, 1990, pp. 30–33., http://www.jstor.org/stable/2691508.

[3] Exploring Binary, Cycle Length of Powers of Two Mod Powers of Ten, Rick Regan, November 6th, 2009

[4]  Exploring Binary, Cycle Length of Powers of Five Mod Powers of Ten,  Rick Regan, December 22nd, 2009

## Sunday, 10 May 2015

### Cheryl's Birthday

The puzzle, originally set in the 2015 Singapore and Asian Schools Math Olympiad, know as Cheryl's Birthday, has generated a lot of interest over the past few weeks.

A discussion of the puzzle here queried the meaning of the term "day" in the statement "Cheryl then tells Albert and Bernard separtely the month and day of her birthday respectively".

The generlly accepted interpretation is that it is the day of the month, but I wondered whether the puzzle could be solved if "day" were interpreted as the day of the week, as suggested in the discussion.

The solution, assuming the "day of the month" interpretation, looks like this:

The days of the week associated with each date in 2015 are:
Now suppose Albert and Bernard are told all these dates, Albert is told the month of Cheryl's birthday and Bernard is told the day of the week. A solution can still be derived, as follows:

A solution with this structure would work for any other year, as the days of the week would all be offset by the same amount.

## Tuesday, 27 January 2015

### Factoring numbers in Python and C++

While trying to solve the "Squares on a Cube" puzzle on Contest Center, I became sidetracked into writing an efficient program to produce the prime factors of a number, an exercise that most programmers will have attempted at some time.

The hybrid Python / C++ code presented here combines well known techniques, Miller Rabin and Brent's variation of Pollard Rho, together with a modern fast technique for modular reduction developed by Rutten and Eekelen.

## Python implementation

After a bit of experimenting, I settled on this mixture for a Python implementation:
• Lookup tables for numbers below 106
• Trial division of primes below 300
• Miller-Rabin primality test (deterministic up to the published limit)
• Brent's variant of the Pollard Rho algorithm, by default running for up to 1 second
• Elliptic Curve Method (ECM) if the Brent algorithm fails to find a factor within its time limit.

## C++ implementation

While the Python version is reasonably fast, a C++ version  is much faster. My C++ code factors numbers below 264, where Brent Pollard Rho can be used without having to use ECM or Multiple Polynomial Quadratic Sieve.

The performance of Brent Pollard Rho and Miller Rabin is ultimately dominated by modular multiplication, so any improvement of modular multiplication time is reflected in the overall performance of the factorisation algorithm. Here are my attempts to speed up modular multiplication:

To calculate $a \times b$ mod $m$  I initially used an intricate, and relatively slow, shift and add method. Several implementations of this can be found on the internet. The version I used was stolen from Craig McQueen's post on stackexchange.  This version is valid for any $m < 2^{64}$

Function modmulSA implements this technique.

### II, Repeated reduction of most significant 64 bits

My first attempt to speed up modular multiplication, valid for a modulus $m < 2^{63}$,  uses the following techniques:

#### a) Division by multiplication

If a modulus of the same denominator needs to be obtained many times, as in the Brent algorithm and the Miller Rabin test, an efficient  technique is to construct an inverse $M=2^k/m$ of the denominator $m$ once (M is often termed a "magic number"), then perform division by $m$ by multiplying the numerator by M and bit shifting to divide by $2^k$.  Of course it's a bit more complicated than that, and is described comprehensively by Henry S. Warren Jr in Hackers Delight, (Chapter 10, Integer Division by Constants), and explained in a readable form by Ridiculous Fish in Labor of Division (Episode I).

C++ functions magicu2 and magicdiv (stolen from Hackers Delight) and magicmod implement these techniques.

#### b) Modular multiplication

1. Reduce a and b modulo m (using the division by multiplication technique)
2. Construct the product $c = a \times b$. If $a$ and $b$ are 64 bit numbers, $c$ is a 128 bit number. I used the MS VC++ intrinsic function for this calculation, but if this is not available, Hackers Delight function mulul64 can be used.
3. Repeatedly find the most significant 64 bits of $c$ and reduce them modulo $m$ (again using division by multiplication), until c is reduced to a number less than $m$
Function modmulRR implements this technique.

### III, Assembler

On 64 bit Intel processors, the multiply instruction multiplies two 64 bit numbers and stores the 128 bit result in two registers, rdx:rax. The divide instruction divides a 128 bit number stored in rdx:rax by a 64 bit number and stores the 64 bit quotient in rax and the 64 bit remainder in rdx.

So these instruction are ideal for modular multiplication. However, the intrinsic function to perform division of a 128 bit number is not available in MS Visual Studio, so a few lines of assembler (mod64.asm) have to be written.

The assembler version has the advantage of being valid for a modulus up to $2^{64}-1$, and is noticeably faster than repeated reduction, and of course much faster than the shift and add method.

Function modmulAS is the C interface to this function.

### IV. Rutten-Eekelen

An even faster method of modular multiplication than assembler, for a modulus $m<2^{63}$, uses the algorithm described in Efficient and formally proven reduction of large integers by small moduli by Luc M. W. J. Rutten and Marko C. J. D. van Eekelen. This method requires the pre-calculation of a similar, but slightly different, magic number to the one described above, and then makes use of this number in an intricate algorithm whose correctness has been verified using a computer generated proof.

Function modmulRE implements this function.

## Performance

 This graph shows the average time to factor numbers in the range $10^n$ to $10^n+10000$ (giving a representative mix of easy and hard factorisations) for all my C++ versions, and for reference, a Python implementation of Brent Pollard and Miller Rabin. The graph shows that the shift and add version has performance no better than the Python version, and that the Rutten-Eekelen version beats the Assembler version, for numbers below $2^{63}$

## C++ / Python integration

The ctypes module integrates Python with a DLL of the C++ implementation. Calling a DLL from Python  incurs an overhead of about 1.5µs, so it is faster to use the Python implementation for small numbers that are factored using a lookup table. For larger numbers, the overhead soon becomes negligible compared to the time to factorise the number.

Rutten-Eekelen has the best performance for a modulus $m<2^{63}$, even beating the assembler version. For a modulus $2^{63} \le m \lt 2^{64}$ the assembler version is fastest, so the final C++ version called from Python uses a hybrid of these two methods.

 To verify that the Brent Pollard Rho algorithm copes with difficult cases, this graph shows the average and worst case times to factorise numbers within $\pm 5000$ of various baselines. Note the jump in worst case factorisation times for numbers $\ge 2^{63}$, where the algorithm switches from Rutten-Eekelen to Assembler.
The worst cases are listed here.

## Source Code

All the source code is located here, except for the optional ECM algorithm, for which I used the Sourceforge implementation pyecm.py.

A stand-alone Python module, factors.py can be run without the C++ DLL or the ECM module being present.
This module has the following principal functions:

 Description Example call Return value all_factors(n) returns a list of all the factors of n The function chooses whether to use the Python or C++ version depending on the size of n. all_factors(12) [1,2,3,4,6,12] totient(n) returns the Euler totient function totient(30) 8 factorise(n, maxBrentTime=1.0) returns a list of the prime factors of n maxBrentTime determines how many seconds the Brent algorithm should run before reverting to ECM  (if available). The function chooses whether to use the Python or C++ version depending on the size of n. factorise(12) [2,2,3]

A DLL, FactorsDLL.dll,  built on Windows 7 64 bit, is included, together with a Python module factorsDLL.py to call the DLL from Python. I'm not sure whether the DLL will work on other machines. In case it doesn't, the source code to build the DLL is also included.

## Implementation notes

The Miller Rabin test can occasionally derive a factor of a composite number, when testing witness $a$ for the primality of $n$ encouters the situation
$a^{2^rd} \ne \pm 1$ mod $n$  and  $a^{2^{r+1}d} =1$ mod $n$

In this situation  $gcd(a^{2^rd}-1,n)$ produces a factor of $n$. I have included this test, even though the condition rarely occurs in practice.

The Brent Pollard Rho algorithm occasionally fails to find a factor of a composite, so I simply run the algorithm again. The random selection of parameters seems to ensure that the algorithm does not have to be run too many times to obtain a factor.

## Postscript

Even though the program vastly reduced the time to factorise a number, I still haven't solved the Contest Center problem that motivated me to write it.

## Tuesday, 7 October 2014

### Calculating square roots on a pinwheel calculator

Before electronic calculators became affordable, complex calculations were often done using pinwheel calculators. These were manufactured by several companies, for example, Odhner, Brunsviga, Thales, Schubert, all based on designs of Frank Baldwin and Willgodt Odhner in the 19th century.

 Original Odhner 127
 Original Odhner 227
 Brunsviga 20

Having recently acquired three of these calculators, I set about investigating what they were capable of. As well as the four basic arithmetic operations, addition, subtraction, multiplication and division, square roots can also be calculated.

The instruction manual for the Original Odhner 227 describes the procedure to extract the square root of 966289.

At first glance, the instructions look like an alchemist's spell, but they are performing the following calculations to derive 966289 = 983:

 From Odhner 227 instructions (click to enlarge)

 c) 966289  - 10000 - 30000 - 50000 - 70000 - 90000 -110000 - 130000 - 150000 - 170000 = 156289 ( 9 subtractions) d) & e) 156289  - 18100 - 18300 - 18500 - 18700 - 18900 - 19100 - 19300 - 19500 = 5889 ( 8 subtractions) f) 5889 -  1961 - 1963 -1965 = 0 ( 3 subtractions)

The algorithm is based on the fact that the square of n is the sum of the first n odd numbers:
 12 = 1     =     1 22 = 4     =     1+3 32 = 9     =     1+3+5 42 = 16    =     1+3+5+7 52 = 25    =     1+3+5+7+9 etc.

 The square root of a number can therefore be calculated by subtracting successive odd numbers until the sum is zero. The square root is the number of subtractions performed. For example, to find the square root of 64, successively subtract 1,3,5,... until zero is reached. The number of subtractions required, in this case 8, is the square root of the number. For large numbers, e.g. 966289, this would obviously be impractical, but we can attack large numbers one pair of digits at a time, to produce one digit of the square root at a time: 64 -  1 = 63 63 -  3 = 60 60 -  5 = 55 55 -  7 = 48 48 -  9 = 39 39 - 11 = 28 28 - 13 = 15 15 - 15 =  0

To find the first digit of the square root of 966289, subtract odd ten thousands: 10000, 30000, 50000, ...  until the point just before the result would turn negative, so subtract 10000+30000+...+170000.  Now, 1+3+...+17 = 92 , so 10000+30000+...+170000 = 900. The first digit of the square root is 9.

To find the first two digits of the square root, we could start with 966289 and subtract odd hundreds: 100, 300, 500,... up to 19500. However, we have already subtracted 9002, which is equal to 100+300+500+...+17700+17900, so we can start subtracting at 18100, subtracting only 18100+18300+...+19500.  We have now subtracted a total of 9802.

Similarly to find the first three digits of the square root (i.e. the complete square root), we could start with 966289 and subtract 1, 3, 5,...up to 1965. But we have already subtracted 9802 = 1+3+5+...+1959, so we can start subtracting at 1961, only needing to subtract 1961+1963+1965

 10000+30000+...+170000 = 9002 18100+18300+...+19500 = (100+300+...+19500) - (100+300+...+17900) = 9802 - 9002 1961+1963+1965 = (1+3+...+1965) - (1+3+...+1959) = 9832 - 9802

The algorithm can also be used to calculate approximations to the square root of numbers that are not perfect squares. Here is a video by James Grime (singingbanana) and Hugh Hunt that demonstrates an Original Odhner being used to calculate 2.