Monday, November 4, 2013

Mid-Atlantic Regional ACM Programming Contest 2013, problem F

In the Mid-Atlantic Regional ACM Programming Contest this year, one of the more interesting problems was the following:

Suppose there are \( N \leq 10^5 \) stars such that star \( i \) has coordinates \( x_i, y_i, z_i \), where \( |x_i|, |y_i|, |z_i| \leq 10^9 \). Given \( K \leq 10^9 \), count the number of pairs of stars which are within distance \( K \) from each other. The cluster of stars is sparse -- that is, there are at most \( M = 10^5 \) pairs of stars within \( K \) of each other.

One simple solution is to think about partitioning the space up into cubes of side length \( K \). Then, for each star, it suffices to check the number of stars in the same cube and adjacent cubes and see whether they're within \( K \). Surprisingly enough, this actually runs in \( O(M) \) operations.

First we consider the case where space is one-dimensional, and our cubes are just intervals of length \( K \). Let the intervals contain \( a_1, a_2, \ldots, a_n \) elements. Note that two stars within the same interval are guaranteed to be within \( K \) of each other. Thus, \[\sum_{i=1}^n a_i^2 \leq M\] What is the number of operations that this will take? Well, for each star in interval \(i\), we examine all stars in intervals \(i - 1\), \(i\), and \(i + 1\). This is just \[\sum_{i=1}^n a_i (a_{i-1} + a_i + a_{i+1})\] Note that the \(a_i^2\) terms are \(\leq M\); furthermore, the \(a_ia_{i-1}\) terms and the \(a_ia_{i+1}\) terms are symmetric. Thus, it suffices to show that \[\sum_{i=1}^n a_i a_{i-1} \leq M\] Here we apply something known as the rearrangement inequality*, which roughly says that if you have two sequences of numbers \(x_i\) and \(y_i\), and you want to rearrange the \(x_i\) and \(y_i\) to maximise \(\sum_i x_i y_i\), then it's best to rearrange them in ascending order. In this case, our sequences are \(a_i\) and \(a_{i-1}\), and the sequences in ascending order is just \(\sum_{i=1}^n a_i^2\), which we already showed was \(\leq M\). This means that the whole algorithm takes \[\sum_{i=1}^n a_i (a_{i-1} + a_i + a_{i+1}) \leq 3 M\]

The same argument can be extended to cubes -- however, in order to guarantee that \( \sum_{i=1}^n a_i^2 \leq M \), we need to make our cubes be of size \(K / \sqrt3\) so that all points within them are guaranteed to be at most \(K\) from each other. This implies that if we instead use cubes of size \(K\), then \(\sum_{i=1}^n a_i^2 \leq 3^3 M\). We can then apply the rearrangement argument as before, except with \(27\) adjacent cubes instead of \(3\). This gives us an algorithm which runs in \(729 M\).

In general, for dimension \(D\) we'll need \((3D)^DM\) operations for the same algorithm, so in high dimensions we'd likely need to exploit other tricks, like the fact that the volume of an \(n\)-ball disappears as \(n\) increases. We could then (maybe?) approximate the sphere with small cubes of size \(\epsilon K\) instead of simply summing over \(3^D\) adjacent, larger cubes of size \(K\).

* -- You could also prove this using Cauchy-Schwarz or some other famous inequality. Rearrangement just happened to be the first one I thought of.