Upper Bounds in Minimizing Constructions

Timothy Mou - Wed 18 January 2023 -

I will describe a type of constructive problem that comes up fairly regularly on programming contests, especially Codeforces. We are given what I will call a "minimizing construction" problem, one where we are asked to construct a least such object that satisfies certain properties. One common prototype is that we are given a structure (e.g., a tree, an array, etc.) and a set of operations we can perform on the structure, and we wish to find a sequence of operations of minimal length such that the structure satisfies a property P P. There are other common examples, but we'll use this language of operations for now. Sometimes the problem asks to output the entire construction (i.e., the list of operations to perform); other times we just need to print out the size of the construction.

This is clearly a very general class of problems, but I want to share a common theme that shows up in the solutions of some of these problems that makes them particularly difficult to approach if you're not already aware of it. The idea is to impose an upper bound on the size of a construction by demonstrating that the problem can always be solved in at most k k (typically, a very small number) operations. This then reduces the problem to checking whether the problem can be solved in 0,1,,k1 0, 1, \dots, k-1 operations, which typically involves employing some additional optimizations to perform these checks efficiently.

I've compiled an (incomplete) list of problems that demonstrate this idea. You are welcome to try to and solve them now, with the mindset that you are searching for these upper-bound constructions. I will outline solutions to some of these and provide additional comments below.

ANDfinity

This problem is a prototypical example of this idea. The first step in these kinds of problems is to play around with the operations and gain some intuition for how they work. In this case, we can observe that it's quite "easy" for nodes to be connected-- in other words, in order for a a and b b to be in different components, not only must they have entirely disjoint sets of bits, their respective components must also have disjoint set of bits. Since there are only up to 30 30 distinct bits, for sufficiently large n n, most random arrays are likely to be form connected graphs. [1]

[1]Okay, citation needed. But this claim is not hard to believe.

From here, it's relatively simple to come up with various constructions that place upper bounds on the number of operations required. For example, one approach is to ensure that all numbers have a common bit, and this is most easily done by making all of the numbers odd. Clearly, this takes at most n n operations.

However, we can do better. A simple improvement is to observe that since there are at most 30 30 distinct bits, there are at most 30 30 connected components, and to make the graph connected, it suffices to ensure that at least one number in each component is odd (since making an even number odd doesn't disturb any of the other bits). For example, consider the array [1,2,4,8,,229] [1, 2, 4, 8, \dots, 2^{29}]. Thus we can reduce this upper bound to 30 30.

We can observe that in many cases, one operation can connect more that just 2 2 components. In particular, in our example above with powers of 2 2, if we subtract 1 1 from 229 2^{29}, we get the binary number 1111112 111\dots111_2 which will connect all the other numbers to form a single connected component. This observation mostly works in the general case--for each number, compute its least significant bit, and for each component, find the vertex with the minimum LSB. Then if we take the component C C with the maximum of all minimal LSBs and subtract 1 1 from any number in this component, this number will be of the form 011011111112 011011\dots11111_2 and connect all the other components, since every other component has some number with a smaller LSB.

The only problem with this construction is that subtracting 1 1 may disconnect the vertex from its own connected component C C--for example, if we subtract 1 1 from 1002 100_2, it is no longer connected to 11002 1100_2. This is only a problem if C C has more than one vertex, and all the other values are even. In this case, we can fix this by simply adding 1 1 to any other vertex in C C.

Thus, we can for sure solve this problem in just 2 2 operations. To check if we can do any better, we can first check if we don't need any operations at all (i.e., the graph is connected to begin with), or if we can connect the graph in just 1 1 operation. Checking if the graph is originally connected is trivial, but checking if just 1 1 operation is sufficient is tricker, since there are up to 2n 2n operations to check, and performing a floodfill each time is O(n3) O(n^3). One approach is to observe that we can convert the connectivity requirement to a graph of 30 30 bits, where we connect i i to j j if there is a number that contains both 2i 2^i and 2j 2^j. This graph is small enough to rebuild and floodfill for each operation.

Observe that the solution decomposes as two fairly disjoint parts--first creating a construction that always requires at most 2 2 operations, and then coming up with a method of checking if just 1 1 operation is sufficient efficiently. This kind of separation of concerns is quite common in these types of problems.

Misha and Paintings

Let D D be the initial number of distinct colors. First, if we have too few colors to begin with (i.e., kD k \geq D), clearly the best we can do is add new colors one by one until we have reached the desired amount k k, so our answer is simply kD k - D. Otherwise, our goal is to reduce the total number of distinct colors by Dk D - k.

It turns out that 2 2 operations is always enough. The general idea is that we can imagine creating a submatrix starting at the top-left corner and gradually increasing its side length. At some length L L, it will have covered just under Dk D - k colors, and if we increasing the length anymore, it will have covered more than Dk D - k colors. Then we can always add another square with its bottom-right corner at (L+1,L+1) (L+1, L+1) and increase its length until it has covered the desired number of colors. This comment helps gives some visual intuition for why this construction works.

This line of reasoning is a bit delicate, but fortunately, we only have to output the minimum number of operations needed. Now all that is left is to check whether we can get k k distinct colors in just 1 1 operation (since k<D k < D, clearly at least one operation is needed). Since n500 n \leq 500 is fairly small, the constraints admit a cubic solution, where we iterate over all side lengths s s and check whether there is a valid submatrix of length s s in O(n2) O(n^2). For each color, we can mark out a region where submatrices of length s s will completely cover all squares of this color, and then use prefix sums to check if a submatrix will cover exactly Dk D - k or Dk1 D - k - 1 colors (we can cover an extra color since we can choose whether to give the submatrix an old or new color).

In this case, I think coming up with this construction is substantially harder, and part of the reason is that there is not much indication that the answer when k<D k < D should be at most 2 2, except for possibly a vague notion that reducing the number of colors should be "easier" then introducing new colors. This is why you should keep this idea in mind when approaching problems that seem to give you a lot of freedom in how you select your operations.

Quadratic Set

We can think of this problem as trying to minimize the size of a subset of {1,2,,n} \{ 1, 2, \dots, n\} to delete to make the resulting set quadratic. It is useful to write a brute force to examine small cases, and from here it is not hard to conjecture that minimum size of the removed set seems to be small, usually at most 3 3.

We can prove this by considering X=1!2!n! X = 1! 2! \dots n! and trying to rewrite it into the form X=y2z X = y^2 z, where z z is a product of a small number of factorials. If n n is a multiple of 4 4, then we can always make the set quadratic by removing n/2 n/2. Similarly, if n2(mod4) n \equiv 2 \pmod{4}, then we can remove 2 2 and n/2 n/2. So if n n is even, we have to remove at most 2 2 numbers. If n n is odd, we have to remove at most 3 3 numbers, since we can always reduce to the even case by removing n n. Therefore, this problem reduces to checking whether we can remove 0 0, 1 1, or 2 2 numbers to make the set quadratic.

To check whether a set is quadratic, we can consider the prime factorization of the product of the set and check that each prime occurs an even number of times. This parity condition leads to think of using xor to count the parity of the number of occurences of a prime. Let H(x) H(x) be the xor of the prime factorization of x x. Then if x x is a square, H(x)=0 H(x) = 0. However, the converse does not necessarily hold, since it's possible that a nomempty set of primes could have an xor of 0 0. This issue is actually a common problem when working with xor (see this problem).

One trick to work around this issue is to assign each prime a random hash, and xor these hashes instead of the primes themselves. Then the probability of a collision becomes vanishingly small. We can then precompute H(i!) H(i!) for i{1,2,,n} i \in \{1, 2, \dots, n\}, and use these values to check whether H(X)=H(i!) H(X) = H(i!) or H(X)=H(i!)H(j!) H(X) = H(i!) \oplus H(j!).

You can refer to the editorial for more details.

Conclusion

These problems are just a few instances of this technique; I'm sure you can find many more examples in other contests as well. I hope you found this useful!