Introduction
A simple way to test whether a whole number is a perfect square is to take its square root, square the integer part and compare the result to the original number. In Python this might be coded as follows (the +0.5 is to avoid rounding errors)from math import sqrt def issquare(n): return int(sqrt(n) + 0.5) ** 2 == n
However, a preliminary test can avoid the need to calculate
the square root for some numbers. Perfect squares always end in
0,1,4,5,6 or 9, so numbers ending in 2,3,7 or 8 could
immediately be eliminated by looking at the remainder after
dividing by 10.
Similarly, a perfect square divided by 16 always leaves a
remainder of 0,1,4 or 9. Numbers with remainders
2,3,5,6,7,8,10,11,12,13,14,15 (75% of numbers) can be eliminated. As 16 is a power of 2, $16=2^4$, the remainder can be obtained in two ways: by division or by finding the last 4 bits by masking. In Python these could be implemented as:
Defining $r(n) = $ $\frac{s(n)}{n}$,
we are interested in numbers with small values of $r(n)$. In the examples above, s(10)=6, r(10)=0.6, s(16)=4, r(16)=0.25.
Stangl shows that $s(n)$ is
multiplicative, i.e $s(ab)=s(a)s(b)$ if $a$ and $b$ are
co-prime, so $r(n)$ is also multiplicative, and is therefore
defined by its values for $p^n$ where $p$ is prime. The
following graph shows the value of $r(p^n)$ for primes up to
29
Values of $r(2^n)$ look much more promising than powers of other
primes, but Stangl's formula shows that $r(2^n)$ tends towards $\frac{1}{6}$
as n increases, so there are diminishing returns
in choosing larger values of $2^n$ as a pre-test divisor.
For comparison, the time taken by the simplest issquare function is shown in yellow
The performance of the best composites (red points) is noticably better than the performance of powers of 2 of similar magnitude. The best performance is for D= 55,440 = 24.32.5.7.11, after which the performance degrades.
Pre calculating the residues of squares modulo D
The overhead of pre-calculating the residues of square
numbers modulo D needs to considered when choosing the
pre-test divisor.
Using the following algorithm, the table shows the time taken to generate the set of residues of squares modulo n. The table shows that the set of square residues mod 55,440 can be calculated in 2ms
Obtaining remainder by division
|
Obtaining remainder by masking
|
Which divisor produces the most efficient pre-test?
Defining $s(n)$ to be the number of possible residues of square numbers modulo n, a formula for $s(n)$ is derived by Stangl, W. D. "Counting Squares in ℤn, Math. Mag. 69, 285-289, 1996. The formula is summarised in Wolfram Mathworld - Square Number.Defining $r(n) = $ $\frac{s(n)}{n}$,
we are interested in numbers with small values of $r(n)$. In the examples above, s(10)=6, r(10)=0.6, s(16)=4, r(16)=0.25.
Can other composite divisors produce smaller values of $r(n)$?
The above graph shows that for primes greater than 2, there is no benefit in including a factor of $p$ higher than $p^2$ in $n$ to reduce $r(n)$. A few powers of 2 may have a benefit in reducing $r(n)$.Testing different divisors
The following graph tests the divisors listed above and divisors of the form $2^n$, by calculating the time to test all numbers in the range 1 to 10,000,000. For divisors of the form $2^n$ , two forms of Python function are tested, using division and masking to obtain the remainder.For comparison, the time taken by the simplest issquare function is shown in yellow
The graph shows that the performance improves for divisors
of the form $2^m$ up to $2^7$, and then levels out. The
graph also shows that there is no significant advantage in
using masking to calculate a number modulo $2^m$ (blue
points) compared to straightforward division (green points).
The performance of the best composites (red points) is noticably better than the performance of powers of 2 of similar magnitude. The best performance is for D= 55,440 = 24.32.5.7.11, after which the performance degrades.
Pre calculating the residues of squares modulo D
The overhead of pre-calculating the residues of square
numbers modulo D needs to considered when choosing the
pre-test divisor.Using the following algorithm, the table shows the time taken to generate the set of residues of squares modulo n. The table shows that the set of square residues mod 55,440 can be calculated in 2ms
SR={0}
D=1
for b in [2**4,3**2,5,7,11]:
squares_mod_b = set([(x*x)%b for x in range(b)])
SR = set([ r + n*D for r in SR for n in range(b)
if (r+n*D)%b in squares_mod_b ])
D*=b
|
|
We can now define algorithms with the best pre-test. Two options are presented, one with a hard coded set of square residues, and one with a calculated set.
$s(720)=48$, which is a reasonable number of square residues to hard-code into an algorithm. An implementation of issquare using D=720 is:
from math import sqrt SR = frozenset([0,1,4,385,324,649,256,144,145,121,580,409,576,25,640,265,\ 544,289,180,676,369,169,304,49,436,9,441,36,64,160,196,481,\ 585,484,81,340,625,601,225,100,16,529,361,496,241,400,244,505]) def issquare(n): if n%720 in SR : return int(sqrt(n) + 0.5) ** 2 == n return False
An implementation using D=55,440, with the square residues being constructed dynamically:
For a further increase in speed of approximately 33%, at the expense of readability, function calls could be replaced with in-line code, for example:
from math import sqrt SR={0} D=1 for b in [2**4,3**2,5,7,11]: squares_mod_b = set([(x*x)%b for x in range(b)]) SR = set([ r + n*D for r in SR for n in range(b) if (r+n*D)%b in squares_mod_b ]) D*=b def issquare(n): if n%D in SR: return int(sqrt(n) + 0.5) ** 2 == n return False
Function calls
count=0 for n in xrange(10000000): if issquare(n): count+=1 |
In-line code
count=0 for n in xrange(10000000): if n%D in SR and int(sqrt(n) + 0.5) ** 2 == n: count+=1 |