Thursday, May 31, 2012

Numbers Are Friends, Not Food: Unfriendly Numbers from InterviewStreet

Today we're going to go over Unfriendly Numbers, an interesting InterviewStreet problem involving a little number theory.


Review

Prerequisites

This article assumes that you are familiar with the Euclidean algorithm, as well as the naive \( O(\sqrt{N}) \) algorithms for integer factorisation and prime sieving.

Bounding the number of prime factors

Suppose we are given some number \(N\) between \(1\) and \(MAX\), inclusive, and we want to know the maximum number of prime factors that \(N\) can have.

We define the function \(primorial(K)\) (see here) as the product of the first \(K\) primes. It should be clear that \(primorial(K)\) is the smallest number with exactly \(K\) distinct prime factors. Thus, the maximum number of prime factors that \(N\) can have is simply the largest value of \(K\) such that \(primorial(K) \leq MAX\).

Since \(primorial(K)\) grows more quickly than \(K!\), then it should be clear that the bound on the number of unique prime factors grows extremely slowly. I.e., we can expect very large numbers to have relatively few numbers of prime factors.

Bounding the number total divisors

Now let us look at a harder problem: suppose we are given a number \(N\) between \(1\) and \(MAX\), and we want to determine the maximum number of total divisors that \(N\) can have. Define the divisor function, or \(d(N)\) as the number of divisors of \(N\). According to this Wikipedia article, the divisor function grows more slowly than \( N^\epsilon \) for all \( \epsilon > 0 \).

However, in programming competitions, this may not be a good enough estimate: more specifically, can we actually compute \( max \{ d(N) \} \) for all \(N\) in our range? Let us begin by introducing some more terminology: a highly composite number is an integer which has more factors than any smaller integer. We can then restate problem as computing the number of factors of \(M\), where \(M\) is the largest highly composite number between \(1\) and \(MAX\).

As it turns out, there is a fast algorithm for computing the largest highly composite number. We begin by first examining a few lemmata. Here we adopt the convention that \(p_i\) refers to the \(i\)-th prime. I.e. \( p_1 = 2 \), \( p_2 = 3 \), \( p_3 = 5 \), and so forth.

  1. Lemma 1: Suppose \(M\) is an integer with factorisation \( p_1^{e_1} p_2^{e_2} p_3^{e_3} \ldots \). Then $$ d(M) = \prod_i ( e_i + 1 ) $$

    Proof: If \(K\) is a divisor of \(M\) with factorisation \( p_1^{f_1} p_2^{f_2} p_3^{f_3} \ldots \), then for all \(i\), we have \( 0 \leq f_i \leq e_i \). There are thus \( e_i + 1 \) possibilities for each \( f_i \), so a total of \( \prod_i ( e_i + 1 ) \) possible factors.

  2. Lemma 2: Suppose \(M\) is a highly composite number with factorisation \( p_1^{e_1} p_2^{e_2} p_3^{e_3} \ldots \). Then \( e_i \geq e_{i+1} \) for all \( i \).

    Proof: Suppose not: i.e. \( e_i < e_{i+1} \) for some \(i\). Swap the exponents and let \(K = p_1^{e_1} p_2^{e_2} \ldots p_i^{e_{i+1}} p_{i+1}^{e_i} \ldots \). Then \( K < M \), but by our previous lemma, \( d(K) = d(M) \). Thus, \(M\) is not highly composite.

By Lemma 2, we know that all highly composite numbers, when factored, have non-increasing sequences of exponents. Could we, then, take a dual approach and search over all non-increasing sequences of exponents to find highly composite numbers? The following recursive algorithm is an implementation of that idea in C++:

// algorithm for finding highly composite numbers

pair< long long, int > rec( long long bound, int index, long long prod, int max_pow )
{
 long long best = prod;
 int num_div = 1;
 prod *= primes[index];
 for( int i = 1; i <= max_pow && prod <= bound; i++, prod *= primes[index] )
 {
  pair< long long, int > next = rec( bound, index + 1, prod, i );
  if( ( i + 1 ) * next.second > num_div )
  {
   best = next.first;
   num_div = ( i + 1 ) * next.second;
  }
 }
 return make_pair( best, num_div );
}

pair< long long, int > largest_highly_composite( long long bound )
{
 return rec( bound, 0, 1, INF );
}
Here, the first term returned, \(M\), is the largest highly composite number less than or equal to bound, and the second term returned is \(d(M)\). The array primes stores a pre-generated array of primes.

The recursion depth is bounded by the number of unique prime factors of any number less than or equal to bound, which we determined earlier was very small. Furthermore, the branching factor is bound by the logarithm of bound, as well as some other factors. In practice, the algorithm runs almost instantaneously for any long long, though there are some overflow issues with the larger numbers.


InterviewStreet Challenge - Unfriendly Numbers

Problem Link

The problem statement is simple enough: count the number of factors of \(K\) that are not factors of any number in \(B_1, B_2, B_3, \ldots, B_N\). We are given the following constraints:

  • \(1 \leq N \leq 10^6\)
  • \(1 \leq K \leq 10^{13}\)
  • \(1 \leq B_i \leq 10^{18}\) for all \(i\)

First, as always, we consider the naive solution: simply generate all the factors of \(K\) and for each factor \(F_i\), we check whether or not it is a factor of each \(B_i\). Unfortunately, there can be up to a million \(B_i\), so checking each \(F_i\) is going to be relatively slow. Let us try using our previous code to estimate the maximum number of \(F_i\). Plugging in largest_highly_composite(1e13) reveals that the largest highly composite number in the range is \(9316358251200\), which has \(10752\) factors. This is a small number, but certainly not small enough to check every pair of \(F_i\) and \(B_j\). We have to try something smarter.

A digression: numbers as vectors

There are many ways to see the next step of the solution; in my case, I relied on geometric intuition, though I'm sure other people did it differently. We begin by considering each integer \( N = p_1^{e_1} p_2^{e_2} \ldots \) as a vector, where the \(i\)-th entry of the vector is \( e_i \). For example, consider numbers that are products of powers of \(2\) and \(3\) only. The vectors corresponding to these numbers have only two nonzero components, and thus can be plotted on the coordinate plane:

Here \(3\) is plotted at \( (0,1) \) indicating that \( 3 \) has the factorisation \( 2^0 3^1 \). Likewise, \( 12 = 2^2 3^1 \) is plotted at \( (2,1) \), and so on. If we include more prime factors, then we'd need more dimensions to represent the numbers. Unfortunately, I'm not too good at drawing things in 12-dimensional space, so I'll leave it as an exercise for you, the reader.

Using this geometric representation, saying that \(a\) is a divisor of \(b\) is equivalent to saying that \(a\) is contained in the box with one corner at \(0\), and another corner at \(b\). That is, if \(F(n)\) is the set of factors of \(n\) then \(F(n)\) appears on the grid as the box bounded by \(0\) and \(n\). Similarly, the greatest common divisor of \(a\) and \(b\) is the `largest' point which appears in the boxes \(F(a)\) and \(F(b)\), and the least common multiple is the `smallest' point whose box contains \(a\) and \(b\).

Now let's consider the problem statement again: we need to find the number of points in \(K\)'s box that are completely outside of the box of any \(B_i\). This is still a hard problem, but now we have a way of visualising it. Let us return to the diagram; suppose \( K = 12 \) and \( B_1 = 18 \):

We want to count the number of points in \(F(12)\) (the green region) that are not in \(F(18)\) (the blue region). In this case, the valid points are \( 12 \) and \( 4 \), and the invalid points are \(1\), \(2\), \(3\), \(6\), \(9\), and \(18\).

Note, however, that we didn't really need to consider all the points in \(F(18)\), as some of them are outside of \(F(12\)) already. Instead, it would have been equivalent to count the points inside \(F(12)\) which were outside of \( F(18) \cap F(12) = F(6) \). More generally, this means that we need to count the points in \(F(K)\) that are not in \( F(K) \cap F(B_i) \) for any \(i\). How does this help us? Well, we showed earlier that \(F(K)\) contained around \(10000\) points. Thus, all the one million \(F(B_i)\) get mapped into at most \(10000\) distinct \( F(K) \cap F(B_i) \).

By this point, you, the astute reader, may have already realised that \( F(K) \cap F(B_i) \) is simply \( F( gcd( K, B_i ) ) \). This suggests the following algorithm:

  1. First, generate all the factors of \(K\). This can be done naively in \(O(\sqrt{K})\) time.
  2. For each \( i \), compute \( gcd( K, B_i ) \) and store it in a set called \( bad \). This takes \( O(N \log K ) \) for \( N \) bad numbers.
  3. Lastly, we compute the answer by finding the number of factors in \( F(K) \) that are factors of no numbers in \( bad \). Since both sets contain at most \( F(K) \) numbers, then this takes \( O( |F(K)|^2 ) \) time.

And there you have it! As usual, being able to solve problems like these requires a solid understanding of the background material, as well as a little bit of insight or elbow grease. A little bit of number-theoretical intuition helps a great deal in programming contests, and as always, the best way to gain said intuition is just to solve more problems.


Homework

Originally I intended to also do a writeup of another excellent problem, EllysNumbers, the Division 1, Level 2 problem from TopCoder SRM 534. However, I am lazy and the people at TopCoder have already written up a good tutorial here. Try solving it yourself!

Tuesday, May 22, 2012

"Мне до лампочки": Cold War Propaganda, Language Learning, and Light Bulbs

In learning a language, it's often the case that we encounter new idioms or expressions that make little sense without historical background or further context. (E.g. this expression, and for a more extreme example, this episode of Star Trek: TNG.) Such was the case when I was learning Russian a few years ago, and I happened upon the expression "(это) мне до лампочки", lit. "it's just like the light bulb to me". Figuratively, this is taken to mean that the person does not care much about the thing to which he is referring. This idiom is apparently relatively new, taking hold within the past century or so, and two of my professors have given me rather differing information about the etymology of this expression.
  1. The first professor, an American, explained to me that during one of the many Soviet industrialisation programmes, there was a drive to provide modern lighting systems to the general populace. However, because of inadequacies in the Soviet system, this programme took far longer to complete than originally envisioned, so the phrase "мне до лампочки" entered Russian vernacular as an expression describing something pointless or worthless.
  2. In a later encounter, a second professor, a Russian, refuted the first professor's claim as American propaganda leftover from the Cold War. According to him, the true origin of the phrase is the following: In earlier times, miners often fastened mining lamps to themselves via straps or harnesses. In Russia, one method of wearing the lamp involved fastening the lamp near one's crotch. Thus, "мне до лампочки" actually arose as a miners' euphemism for a more vulgar phrase, later being adopted by the Russian-speaking population as a whole.
While both explanations are plausible, a quick Google search for "мне до лампочки этимология [etymology]" lends more support to the latter theory. This naturally leads to the question: supposing the first explanation was indeed false, how and why did it arise? This may be hard to determine without much more research, but it seems clear that even in fields such as etymology, political and cultural biases can surface.