Today we're going to go over Unfriendly Numbers, an interesting InterviewStreet problem involving a little number theory.

Review

Prerequisites

This article assumes that you are familiar with the Euclidean algorithm, as well as the naive $ O(\sqrt{N}) $ algorithms for integer factorisation and prime sieving.

Bounding the number of prime factors

Suppose we are given some number $N$ between $1$ and $MAX$, inclusive, and we want to know the maximum number of prime factors that $N$ can have.

We define the function $primorial(K)$ (see here) as the product of the first $K$ primes. It should be clear that $primorial(K)$ is the smallest number with exactly $K$ distinct prime factors. Thus, the maximum number of prime factors that $N$ can have is simply the largest value of $K$ such that $primorial(K) \leq MAX$.

Since $primorial(K)$ grows more quickly than $K!$, then it should be clear that the bound on the number of unique prime factors grows extremely slowly. I.e., we can expect very large numbers to have relatively few numbers of prime factors.

Bounding the number total divisors

Now let us look at a harder problem: suppose we are given a number $N$ between $1$ and $MAX$, and we want to determine the maximum number of total divisors that $N$ can have. Define the divisor function, or $d(N)$ as the number of divisors of $N$. According to this Wikipedia article, the divisor function grows more slowly than $ N^\epsilon $ for all $ \epsilon > 0 $.

However, in programming competitions, this may not be a good enough estimate: more specifically, can we actually compute $ max \{ d(N) \} $ for all $N$ in our range? Let us begin by introducing some more terminology: a highly composite number is an integer which has more factors than any smaller integer. We can then restate problem as computing the number of factors of $M$, where $M$ is the largest highly composite number between $1$ and $MAX$.

As it turns out, there is a fast algorithm for computing the largest highly composite number. We begin by first examining a few lemmata. Here we adopt the convention that $p_i$ refers to the $i$-th prime. I.e. $ p_1 = 2 $, $ p_2 = 3 $, $ p_3 = 5 $, and so forth.

Lemma 1: Suppose $M$ is an integer with factorisation $ p_1^{e_1} p_2^{e_2} p_3^{e_3} \ldots $. Then $$ d(M) = \prod_i ( e_i + 1 ) $$

Proof: If $K$ is a divisor of $M$ with factorisation $ p_1^{f_1} p_2^{f_2} p_3^{f_3} \ldots $, then for all $i$, we have $ 0 \leq f_i \leq e_i $. There are thus $ e_i + 1 $ possibilities for each $ f_i $, so a total of $ \prod_i ( e_i + 1 ) $ possible factors.
Lemma 2: Suppose $M$ is a highly composite number with factorisation $ p_1^{e_1} p_2^{e_2} p_3^{e_3} \ldots $. Then $ e_i \geq e_{i+1} $ for all $ i $.

Proof: Suppose not: i.e. $ e_i < e_{i+1} $ for some $i$. Swap the exponents and let $K = p_1^{e_1} p_2^{e_2} \ldots p_i^{e_{i+1}} p_{i+1}^{e_i} \ldots $. Then $ K < M $, but by our previous lemma, $ d(K) = d(M) $. Thus, $M$ is not highly composite.

By Lemma 2, we know that all highly composite numbers, when factored, have non-increasing sequences of exponents. Could we, then, take a dual approach and search over all non-increasing sequences of exponents to find highly composite numbers? The following recursive algorithm is an implementation of that idea in C++:

// algorithm for finding highly composite numbers

pair< long long, int > rec( long long bound, int index, long long prod, int max_pow )
{
 long long best = prod;
 int num_div = 1;
 prod *= primes[index];
 for( int i = 1; i <= max_pow && prod <= bound; i++, prod *= primes[index] )
 {
  pair< long long, int > next = rec( bound, index + 1, prod, i );
  if( ( i + 1 ) * next.second > num_div )
  {
   best = next.first;
   num_div = ( i + 1 ) * next.second;
  }
 }
 return make_pair( best, num_div );
}

pair< long long, int > largest_highly_composite( long long bound )
{
 return rec( bound, 0, 1, INF );
}

Here, the first term returned, $M$, is the largest highly composite number less than or equal to bound, and the second term returned is $d(M)$. The array primes stores a pre-generated array of primes.

The recursion depth is bounded by the number of unique prime factors of any number less than or equal to bound, which we determined earlier was very small. Furthermore, the branching factor is bound by the logarithm of bound, as well as some other factors. In practice, the algorithm runs almost instantaneously for any long long, though there are some overflow issues with the larger numbers.

InterviewStreet Challenge - Unfriendly Numbers

Problem Link

The problem statement is simple enough: count the number of factors of $K$ that are not factors of any number in $B_1, B_2, B_3, \ldots, B_N$. We are given the following constraints:

$1 \leq N \leq 10^6$
$1 \leq K \leq 10^{13}$
$1 \leq B_i \leq 10^{18}$ for all $i$

First, as always, we consider the naive solution: simply generate all the factors of $K$ and for each factor $F_i$, we check whether or not it is a factor of each $B_i$. Unfortunately, there can be up to a million $B_i$, so checking each $F_i$ is going to be relatively slow. Let us try using our previous code to estimate the maximum number of $F_i$. Plugging in largest_highly_composite(1e13) reveals that the largest highly composite number in the range is $9316358251200$, which has $10752$ factors. This is a small number, but certainly not small enough to check every pair of $F_i$ and $B_j$. We have to try something smarter.

A digression: numbers as vectors

There are many ways to see the next step of the solution; in my case, I relied on geometric intuition, though I'm sure other people did it differently. We begin by considering each integer $ N = p_1^{e_1} p_2^{e_2} \ldots $ as a vector, where the $i$-th entry of the vector is $ e_i $. For example, consider numbers that are products of powers of $2$ and $3$ only. The vectors corresponding to these numbers have only two nonzero components, and thus can be plotted on the coordinate plane:

Here $3$ is plotted at $ (0,1) $ indicating that $ 3 $ has the factorisation $ 2^0 3^1 $. Likewise, $ 12 = 2^2 3^1 $ is plotted at $ (2,1) $, and so on. If we include more prime factors, then we'd need more dimensions to represent the numbers. Unfortunately, I'm not too good at drawing things in 12-dimensional space, so I'll leave it as an exercise for you, the reader.

Using this geometric representation, saying that $a$ is a divisor of $b$ is equivalent to saying that $a$ is contained in the box with one corner at $0$, and another corner at $b$. That is, if $F(n)$ is the set of factors of $n$ then $F(n)$ appears on the grid as the box bounded by $0$ and $n$. Similarly, the greatest common divisor of $a$ and $b$ is the `largest' point which appears in the boxes $F(a)$ and $F(b)$, and the least common multiple is the `smallest' point whose box contains $a$ and $b$.

Now let's consider the problem statement again: we need to find the number of points in $K$'s box that are completely outside of the box of any $B_i$. This is still a hard problem, but now we have a way of visualising it. Let us return to the diagram; suppose $ K = 12 $ and $ B_1 = 18 $:

We want to count the number of points in $F(12)$ (the green region) that are not in $F(18)$ (the blue region). In this case, the valid points are $ 12 $ and $ 4 $, and the invalid points are $1$, $2$, $3$, $6$, $9$, and $18$.

Note, however, that we didn't really need to consider all the points in $F(18)$, as some of them are outside of $F(12$) already. Instead, it would have been equivalent to count the points inside $F(12)$ which were outside of $ F(18) \cap F(12) = F(6) $. More generally, this means that we need to count the points in $F(K)$ that are not in $ F(K) \cap F(B_i) $ for any $i$. How does this help us? Well, we showed earlier that $F(K)$ contained around $10000$ points. Thus, all the one million $F(B_i)$ get mapped into at most $10000$ distinct $ F(K) \cap F(B_i) $.

By this point, you, the astute reader, may have already realised that $ F(K) \cap F(B_i) $ is simply $ F( gcd( K, B_i ) ) $. This suggests the following algorithm:

First, generate all the factors of $K$. This can be done naively in $O(\sqrt{K})$ time.
For each $ i $, compute $ gcd( K, B_i ) $ and store it in a set called $ bad $. This takes $ O(N \log K ) $ for $ N $ bad numbers.
Lastly, we compute the answer by finding the number of factors in $ F(K) $ that are factors of no numbers in $ bad $. Since both sets contain at most $ F(K) $ numbers, then this takes $ O( |F(K)|^2 ) $ time.

And there you have it! As usual, being able to solve problems like these requires a solid understanding of the background material, as well as a little bit of insight or elbow grease. A little bit of number-theoretical intuition helps a great deal in programming contests, and as always, the best way to gain said intuition is just to solve more problems.

Homework

Originally I intended to also do a writeup of another excellent problem, EllysNumbers, the Division 1, Level 2 problem from TopCoder SRM 534. However, I am lazy and the people at TopCoder have already written up a good tutorial here. Try solving it yourself!

bababadalgharaghtakamminarronnkonnbro

Thursday, May 31, 2012

Numbers Are Friends, Not Food: Unfriendly Numbers from InterviewStreet