From f77cf654d36a677caba14f1dcb809dc227f3ab23 Mon Sep 17 00:00:00 2001
From: Kevin Atienza Anyway, the most straightforward solution would be to just do it as stated, a.k.a., brute force: enumerate all \(2^{rc}\) grids, compute \(B(G)\) for each of them, and then sum up all these \(B(G)^3\). Enumerating grids is relatively straightforward with backtracking, and for the first subtask, \(2^{rc} = 2^{25} = 33554432\) which is quite manageable for a computer. The only missing ingredient to fully implement this solution is being able to compute \(B(G)\) for a given grid \(G\). We are given a grid with \(r\) rows and \(c\) columns, and we want to make it based, i.e., change it so that every row and every column has an even number of cringe memes. Let’s convert a cool meme (🗿) into a \(0\) and a cringe meme (😬) into a \(1\), so the condition translates to: the sum of every row and every column is even. Let’s convert a cool meme (🗿) into a \(0\) and a cringe meme (😬) into a \(1\), so the condition translates to: the sum of every row and every column is even. Now, the effect of flipping a cell is to flip the parity of exactly one row and exactly one column, namely the row and column containing the cell. Thus, if there are \(R\) odd rows, then you need at least \(R\) flips to make all these odd rows even. Similarly, if there are \(C\) odd columns, then you need at least \(C\) flips. Combining these tells us that we need \(\max(R, C)\) or more moves to make the grid based. Remark: You can also just compute the Lucas numbers modulo \(2m\); that way, reducing them modulo \(m\) and \(2\) is still valid. (Can you see why?) Remark: You could also just compute the Lucas numbers modulo \(2m\); that way, reducing them modulo \(m\) and \(2\) is still valid. (Can you see why?)A straightforward approach
Computing \(B(G)\)
Generate the parities separately
if parity == 0:
total = (total + value**2) % mPen-and-paper insight
It turns out that there’s a simple criterion that gives us the parity of a Lucas number given only its index.
diff --git a/2023/tama/lucas/index.md b/2023/tama/lucas/index.md
index d543536..bfe0392 100644
--- a/2023/tama/lucas/index.md
+++ b/2023/tama/lucas/index.md
@@ -61,7 +61,7 @@ for value, parity in zip(L, L_parity):
```
Solution Writeup
Solution Writeup: Kevin Atienza
In this editorial, I'll pretend that we're computing the full answer (a rational number) instead of the answer modulo \(998244353\). It turns out that many exact solutions can be adapted to compute the answer modulo something instead. A bonus section below describes how to do it.
+In this editorial, I’ll pretend that we’re computing the full answer (a rational number) instead of the answer modulo \(998244353\). It turns out that many exact solutions can be adapted to compute the answer modulo something instead. A bonus section below describes how to do it.
Subtask 1
For Subtask 1, I'll describe a solution that doesn't use a lot of insights and essentially only uses dynamic programming (DP) (aside from the definition of expected value). You could also solve this subtask with pen and paper by using the solution for Subtask 2, which is perfectly doable by hand (and easier to implement as well).
+For Subtask 1, I’ll describe a solution that doesn’t use a lot of insights and essentially only uses dynamic programming (DP) (aside from the definition of expected value). You could also solve this subtask with pen and paper by using the solution for Subtask 2, which is perfectly doable by hand (and easier to implement as well).
If you have some sort of “random variable” \(X\), then we say that the expected value of \(X\), denoted \(\operatorname{E}[X]\), is the weighted sum of the possible results of \(X\), weighted by their probabilities. More formally, if the possible results are \(\{x_1, x_2, \ldots, x_k\}\) with respective probabilities \(p_1, p_2, \ldots, p_k\), then \[\operatorname{E}[X] := p_1x_1 + p_2x_2 + \ldots + p_kx_k,\] or in summation notation, \[\operatorname{E}[X] := \sum_{i=1}^k p_ix_i.\] The expected value of \(X\) can be thought of as the average value of \(X\), when an experiment is performed many, many times and averaging the value of \(X\) across them.
Here are some examples:
So let's define a random variable \(T\) representing the result of the process outlined in the problem statement. The process chooses \(w\) numbers randomly1 between \(1\) and \(k\), and \(T\) is calculated as the sum of the \(n\) largest elements, so the possible results are between \(n\) and \(nk\). If we write the probability of obtaining the result \(t\) as \(p_t\), then the answer is \[\operatorname{E}[T] = \sum_{t=n}^{nk}\, p_t\,t.\] So we are done if we can compute \(p_t\) for each \(t\) from \(n\) to \(nk\).
+So let’s define a random variable \(T\) representing the result of the process outlined in the problem statement. The process chooses \(w\) numbers randomly1 between \(1\) and \(k\), and \(T\) is calculated as the sum of the \(n\) largest elements, so the possible results are between \(n\) and \(nk\). If we write the probability of obtaining the result \(t\) as \(p_t\), then the answer is \[\operatorname{E}[T] = \sum_{t=n}^{nk}\, p_t\,t.\] So we are done if we can compute \(p_t\) for each \(t\) from \(n\) to \(nk\).
Now, the process has \(k^w\) possible outcomes—namely all the sequences of length \(w\), each element of which is between \(1\) and \(k\)—and each of those outcomes is equally likely. Therefore, we can simply count the number of outcomes that result in a sum of \(t\), then divide by \(k^w\) to get the probability. If we write the number of sequences whose sum of \(n\) largest elements is \(t\) as \(c_t\), then we simply have \[p_t = \frac{c_t}{k^w}.\]
-So we've now reduced the problem to computing the \(c_t\)s. Now, a sum of \(t\) can arise in multiple ways. For example, if \(n = 3\) and \(t = 10\), then the top \(3\) values of the sequence (each in sorted order) could be \([2, 3, 5]\), or it could be \([2, 4, 4]\), or \([1, 1, 8]\), or something else. So, to count the number of sequences whose sum of \(n\) largest elements is \(t\), we need to enumerate all possible sequences of top \(n\) values whose sum is \(t\), and for each one, count the number of sequences of length \(w\) whose sequence of top \(n\) values is that sequence.
-If that's confusing, let's formalize a bit. Let's define a winner sequence as a sorted sequence of \(n\) values, each of which is between \(1\) and \(k\). Winner sequences are exactly the possible “sequences of \(n\) largest values”. Now, if \(W\) is a winner sequence, let's define \(c(W, w)\) as the number of length-\(w\) sequences whose sequence of \(n\) largest values is \(W\). Then you may check that the following equation holds \[c_t = \sum_{\substack{\text{$W$ is a winner sequence} \\ \mathit{sum}(W) = t}} c(W, w).\] Thus, we've further reduced the problem to that of computing \(c(W, w)\) across all winner sequences \(W\). And as it turns out, for Subtask 1, there aren't that many winner sequences. We can see this by simply enumerating them all (say with a computer). Finding a formula for the number of them isn't that hard either: +So we’ve now reduced the problem to computing the \(c_t\)s. Now, a sum of \(t\) can arise in multiple ways. For example, if \(n = 3\) and \(t = 10\), then the top \(3\) values of the sequence (each in sorted order) could be \([2, 3, 5]\), or it could be \([2, 4, 4]\), or \([1, 1, 8]\), or something else. So, to count the number of sequences whose sum of \(n\) largest elements is \(t\), we need to enumerate all possible sequences of top \(n\) values whose sum is \(t\), and for each one, count the number of sequences of length \(w\) whose sequence of top \(n\) values is that sequence.
+If that’s confusing, let’s formalize a bit. Let’s define a winner sequence as a sorted sequence of \(n\) values, each of which is between \(1\) and \(k\). Winner sequences are exactly the possible “sequences of \(n\) largest values”. Now, if \(W\) is a winner sequence, let’s define \(c(W, w)\) as the number of length-\(w\) sequences whose sequence of \(n\) largest values is \(W\). Then you may check that the following equation holds \[c_t = \sum_{\substack{\text{$W$ is a winner sequence} \\ \mathit{sum}(W) = t}} c(W, w).\] Thus, we’ve further reduced the problem to that of computing \(c(W, w)\) across all winner sequences \(W\). And as it turns out, for Subtask 1, there aren’t that many winner sequences. We can see this by simply enumerating them all (say with a computer). Finding a formula for the number of them isn’t that hard either:Exercise: Show that the number of winner sequences is exactly \(\binom{n + k - 1}{n}\).
For Subtask 1, \(n = 5\) and \(k = 5\), so \(\binom{n + k - 1}{n} = 126\), so there are indeed only a few of them.
Thinking “DP-cally”, we now attempt to build the length-\(w\) sequence element by element. As we build the sequence, its “sequence of \(n\) largest elements” changes as well.
-Let's be more precise. For a sequence \(S\), let's call the “sequence of \(n\) largest elements of \(S\)” its winning sequence, and denote it by \(W_S\). Now, suppose we insert the value \(v\) to \(S\). Let's denote the updated sequence by \(S + [v]\). Then the winning sequence might change because of \(v\). Specifically, the new winning sequence is obtained by inserting \(v\) to \(W_S\) in its proper sorted location, and then dropping the lowest element. (Can you see why?) Let's denote the process of “inserting a value \(v\) to a sequence \(W\) in its proper sorted location, and then dropping the lowest element” as a pushpop operation, and denote it by \(\mathit{pushpop}(W, v)\). Then what we're saying is that the winning sequence of \(S + [v]\) is related to the winning sequence of \(S\) via a pushpop operation—specifically, \[W_{S + [v]} = \mathit{pushpop}(W_S, v).\]
-We can now think recursively, and find a recurrence for \(c(W, w)\), as follows. Every sequence of length \(w\) can be obtained by taking a sequence \(S\) of length \(w - 1\) and then appending some value \(v\) (between \(1\) to \(k\)) to it. And as described above, the new winning sequence \(W_{S + [v]}\) is just \(\mathit{pushpop}(W_S, v)\). Notice that this latter expression only depends on \(W_S\), not on \(S\) itself. Thus, for each possible winner sequence \(W'\), we could simply collect the sequences \(S\) with \(W'\) as their winning sequence, and notice that the new winning sequence must be \(\mathit{pushpop}(W', v)\). In other words, we have the equation \[c(W, w) = \!\!\!\!\sum_{\substack{W'\,\,\,\, \\ \text{$W'$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W', v) = W}} \!\!\!\!(\text{number of sequences $S$ of length $w - 1$ whose winning sequence is $W'$}).\] But the summand is just \(c(W', w - 1)\) by definition! Therefore, we obtain the recurrence \[c(W, w) = \sum_{\substack{W'\,\,\,\, \\ \text{$W'$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W', v) = W}} c(W', w - 1),\] and we can use this to compute all \(c(W, w')\) we need, via DP: we build a table of results, one for each winner sequence \(W\) and each \(w' \le w\). Each entry of the table can be computed using the summation above. Since our formula for \(c(W, w')\) only depends on \(c(W', w' - 1)\), i.e., those with a smaller \(w'\) value, if we compute the table in increasing order of \(w'\), those values have already been computed, and are already on the table. Thus, we'll be able to compute the final result all the way up to \(w\), which is what we wanted.
+Let’s be more precise. For a sequence \(S\), let’s call the “sequence of \(n\) largest elements of \(S\)” its winning sequence, and denote it by \(W_S\). Now, suppose we insert the value \(v\) to \(S\). Let’s denote the updated sequence by \(S + [v]\). Then the winning sequence might change because of \(v\). Specifically, the new winning sequence is obtained by inserting \(v\) to \(W_S\) in its proper sorted location, and then dropping the lowest element. (Can you see why?) Let’s denote the process of “inserting a value \(v\) to a sequence \(W\) in its proper sorted location, and then dropping the lowest element” as a pushpop operation, and denote it by \(\mathit{pushpop}(W, v)\). Then what we’re saying is that the winning sequence of \(S + [v]\) is related to the winning sequence of \(S\) via a pushpop operation—specifically, \[W_{S + [v]} = \mathit{pushpop}(W_S, v).\]
+We can now think recursively, and find a recurrence for \(c(W, w)\), as follows. Every sequence of length \(w\) can be obtained by taking a sequence \(S\) of length \(w - 1\) and then appending some value \(v\) (between \(1\) to \(k\)) to it. And as described above, the new winning sequence \(W_{S + [v]}\) is just \(\mathit{pushpop}(W_S, v)\). Notice that this latter expression only depends on \(W_S\), not on \(S\) itself. Thus, for each possible winner sequence \(W'\), we could simply collect the sequences \(S\) with \(W'\) as their winning sequence, and notice that the new winning sequence must be \(\mathit{pushpop}(W', v)\). In other words, we have the equation \[c(W, w) = \!\!\!\!\sum_{\substack{W'\,\,\,\, \\ \text{$W'$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W', v) = W}} \!\!\!\!(\text{number of sequences $S$ of length $w - 1$ whose winning sequence is $W'$}).\] But the summand is just \(c(W', w - 1)\) by definition! Therefore, we obtain the recurrence \[c(W, w) = \sum_{\substack{W'\,\,\,\, \\ \text{$W'$ is a winner sequence}}} \sum_{\substack{1 \le v \le k \,\,\,\, \\ \mathit{pushpop}(W', v) = W}} c(W', w - 1),\] and we can use this to compute all \(c(W, w')\) we need, via DP: we build a table of results, one for each winner sequence \(W\) and each \(w' \le w\). Each entry of the table can be computed using the summation above. Since our formula for \(c(W, w')\) only depends on \(c(W', w' - 1)\), i.e., those with a smaller \(w'\) value, if we compute the table in increasing order of \(w'\), those values have already been computed, and are already on the table. Thus, we’ll be able to compute the final result all the way up to \(w\), which is what we wanted.
Now, as for the base case, you could just directly count the sequences for, say, \(w' = n\), since the winning sequence is basically the sorted version of the sequence itself. Alternatively, we can use \(w' = 0\) as our base case, though we need to think about what the winning sequence of a sequence with less than \(n\) elements should be. Well, it makes sense to say that the winning sequence must be the whole sequence as well, just sorted. And instead of a pushpop operation, we could simply use a push operation, at least while the sequence still has length less than \(n\).
-With this, we now have a solution! What's the running time? Well, the table has an entry for each \((W, w')\) with \(W\) a winner sequence and \(w' \le w\). Recall that there are \(\binom{n + k - 1}{n}\) winner sequences, so there are \(\approx \binom{n + k - 1}{n}w\) entries. Each entry is computed with the sum above, which clearly has at most \(\binom{n + k - 1}{n}k\) summands (often much less). Therefore, the amount of steps is roughly proportional to \[\approx \binom{n + k - 1}{n}w\cdot \binom{n + k - 1}{n}k = \binom{n + k - 1}{n}^2 wk.\] For Subtask 1, this is good enough; my straightforward Python implementation computes the full answer in less than one second.
+With this, we now have a solution! What’s the running time? Well, the table has an entry for each \((W, w')\) with \(W\) a winner sequence and \(w' \le w\). Recall that there are \(\binom{n + k - 1}{n}\) winner sequences, so there are \(\approx \binom{n + k - 1}{n}w\) entries. Each entry is computed with the sum above, which clearly has at most \(\binom{n + k - 1}{n}k\) summands (often much less). Therefore, the amount of steps is roughly proportional to \[\approx \binom{n + k - 1}{n}w\cdot \binom{n + k - 1}{n}k = \binom{n + k - 1}{n}^2 wk.\] For Subtask 1, this is good enough; my straightforward Python implementation computes the full answer in less than one second.
Note: Understanding this implementation is not required to understand the following sections, so you may skip it.
Remark: The implementation tries to copy our formulas above as closely as possible. As a result, it's highly unoptimized, and there are definitely several improvements that be made. But the main point is that even such unoptimized code is enough to solve the subtask.
+Remark: The implementation tries to copy our formulas above as closely as possible. As a result, it’s highly unoptimized, and there are definitely several improvements that be made. But the main point is that even such unoptimized code is enough to solve the subtask.
Subtask 2
The first one is quite intuitive; after all, \(\alpha X\) is just \(X\) with all values scaled by \(\alpha\), so the average should just be scaled in the same way. However, the second property—additivity—may be surprising. The property could be intuitive in the case where \(X_1\) and \(X_2\) are independent, but linearity doesn't require them to be—it's simply always true!
-In a bonus section below, we'll explain why this is true, but for now, let's first try to apply this to the problem. Let \(T\) be the same random variable as before, so it denotes the sum of the \(n\) largest values of the sequence produced. Now, we define \(n\) new random variables \(T_1, T_2, \ldots T_n\), where \(T_i\) denotes the \(i\)th largest value of the sequence. Then clearly we have \[T = T_1 + T_2 + \ldots + T_n = \sum_{i=1}^n T_i.\] Now, the \(T_i\)'s are definitely not independent, e.g., knowing the largest value constrains the possible values of the second value, and vice versa. Regardless, expectation is always additive, so we have the equality \[\operatorname{E}[T] = \operatorname{E}[T_1] + \operatorname{E}[T_2] + \ldots + \operatorname{E}[T_n] = \sum_{i=1}^n \operatorname{E}[T_i].\] Thus, we've reduced the problem to computing \(\operatorname{E}[T_i]\) for \(1 \le i \le n\), which is potentially more manageable!
+The first one is quite intuitive; after all, \(\alpha X\) is just \(X\) with all values scaled by \(\alpha\), so the average should just be scaled in the same way. However, the second property—additivity—may be surprising. The property could be intuitive in the case where \(X_1\) and \(X_2\) are independent, but linearity doesn’t require them to be—it’s simply always true!
+In a bonus section below, we’ll explain why this is true, but for now, let’s first try to apply this to the problem. Let \(T\) be the same random variable as before, so it denotes the sum of the \(n\) largest values of the sequence produced. Now, we define \(n\) new random variables \(T_1, T_2, \ldots T_n\), where \(T_i\) denotes the \(i\)th largest value of the sequence. Then clearly we have \[T = T_1 + T_2 + \ldots + T_n = \sum_{i=1}^n T_i.\] Now, the \(T_i\)’s are definitely not independent, e.g., knowing the largest value constrains the possible values of the second largest, and vice versa. Regardless, expectation is always additive, so we have the equality \[\operatorname{E}[T] = \operatorname{E}[T_1] + \operatorname{E}[T_2] + \ldots + \operatorname{E}[T_n] = \sum_{i=1}^n \operatorname{E}[T_i].\] Thus, we’ve reduced the problem to computing \(\operatorname{E}[T_i]\) for \(1 \le i \le n\), which is potentially more manageable!
Let's now try to compute \(\operatorname{E}[T_i]\), the expected value of the \(i\)th largest element of the sequence. The possible values are between \(1\) and \(k\), so by definition, we have \[\operatorname{E}[T_i] = \sum_{v=1}^k \operatorname{P}[T_i = v]\cdot v,\] where \(\operatorname{P}[T_i = v]\) denotes the probability that \(T_i = v\). Next, we again turn probability into counting; noting that there are \(k^w\) equally likely possibilities, we have something like \[\operatorname{P}[T_i = v] = \frac{\mathit{count}_{=v}(i)}{k^w}\] where \(\mathit{count}_{=v}(i)\) denotes the number of sequences whose \(i\)th largest value is \(v\). Thus, we're done if we can compute \(\mathit{count}_{=v}(i)\).
+Let’s now try to compute \(\operatorname{E}[T_i]\), the expected value of the \(i\)th largest element of the sequence. The possible values are between \(1\) and \(k\), so by definition, we have \[\operatorname{E}[T_i] = \sum_{v=1}^k \operatorname{P}[T_i = v]\cdot v,\] where \(\operatorname{P}[T_i = v]\) denotes the probability that \(T_i = v\). Next, we again turn probability into counting; noting that there are \(k^w\) equally likely possibilities, we have something like \[\operatorname{P}[T_i = v] = \frac{\mathit{count}_{=v}(i)}{k^w}\] where \(\mathit{count}_{=v}(i)\) denotes the number of sequences whose \(i\)th largest value is \(v\). Thus, we’re done if we can compute \(\mathit{count}_{=v}(i)\).
We can compute \(\mathit{count}_{=v}(i)\) by noting that:
Thus, all in all, there are \[c(\ell, g, v) = \binom{w}{\ell} \cdot \binom{w - \ell}{g} \cdot (v-1)^{\ell} \cdot (k-v)^g\] such sequences.
We now have a complete solution! How fast does it run? Well, we need to compute \(\operatorname{E}[T_i]\) for \(1 \le i \le n\), which in turn require the values \(\mathit{count}_{=v}(i)\) for \(1 \le i \le n\) and \(1 \le v \le k\), which in turn require the values \(c(\ell, g, v)\) for \(0 \le \ell \le w - 1\), \(0 \le g \le n - 1\) and \(1 \le v \le k\).
Each \(c(\ell, g, v)\) value is a product of some binomial coefficients and powers. The powers can all be computed with fast exponentiation, or they could just be precomputed in a table at the beginning (since all powers we need have bases less than \(k\), and exponents less than \(w\)), and the binomial coefficients can also be precomputed in a table, either via Pascal's identity, or precomputing factorials and using \[\binom{a}{b} = \frac{a!}{(a - b)!b!}.\] Therefore, we could say that each \(c(\ell, g, v)\) can be computed in a constant amount of steps, and since there are \(\approx wnk\) of them, the total number of steps to compute them all is \(\approx wnk\).
Each \(c(\ell, g, v)\) value is a product of some binomial coefficients and powers. The powers can all be computed with fast exponentiation, or they could just be precomputed in a table at the beginning (since all powers we need have bases less than \(k\), and exponents less than \(w\)), and the binomial coefficients can also be precomputed in a table, either via Pascal’s identity, or precomputing factorials and using \[\binom{a}{b} = \frac{a!}{(a - b)!b!}.\] Therefore, we could say that each \(c(\ell, g, v)\) can be computed in a constant amount of steps, and since there are \(\approx wnk\) of them, the total number of steps to compute them all is \(\approx wnk\).
To compute the \(\mathit{count}_{=v}(i)\) values, note that there are \(kn\) such values, and each one is computed with a summation with \(\approx wn\) summands. Therefore, it takes \(\approx wn^2 k\) steps to compute them all.
The formula for \(\operatorname{E}[T]\) has \(n\) summands, each of which has a formula with \(k\) summands, so this takes \(\approx nk\) steps.
Finally, we also need to account for the precomputation of factorials and powers. There are \(\approx w\) factorials and \(\approx kw\) powers to precompute, so their precomputation takes \(\approx kw\) steps.
Thus, the running time is dominated by the computation of \(\mathit{count}_{=v}(i)\). For Subtask 2, we have \(wn^2 k = 6\cdot 10^9\), so the number of steps seems small enough for this to be waitable if you use a fast language and a highly optimized implementation. It may be slow though, so instead of that, let's just improve our algorithm further.
+Thus, the running time is dominated by the computation of \(\mathit{count}_{=v}(i)\). For Subtask 2, we have \(wn^2 k = 6\cdot 10^9\), so the number of steps seems small enough for this to be waitable if you use a fast language and a highly optimized implementation. It may be slow though, so instead of that, let’s just improve our algorithm further.
Let's look at \(\mathit{count}_{=v}(i)\) again. It denotes the number of sequences whose \(i\)th largest value is exactly \(v\). It turns out that it's easier to count the number of sequences whose \(i\)th largest value is at most \(v\). Even more nicely, it turns out that you can use the latter to compute the former!
-To see this, let's define \(\mathit{count}_{\le v}(i)\) to be the number of sequences whose \(i\)th largest value is at most \(v\). Then we easily have: +Let’s look at \(\mathit{count}_{=v}(i)\) again. It denotes the number of sequences whose \(i\)th largest value is exactly \(v\). It turns out that it’s easier to count the number of sequences whose \(i\)th largest value is at most \(v\). Even more nicely, it turns out that you can use the latter to compute the former!
+To see this, let’s define \(\mathit{count}_{\le v}(i)\) to be the number of sequences whose \(i\)th largest value is at most \(v\). Then we easily have:Claim: \(\mathit{count}_{=v}(i) = \mathit{count}_{\le v}(i) - \mathit{count}_{\le v - 1}(i)\).
Proof: Left as an exercise to the reader.
Theorem 2: The \(i\)th largest value of a sequence is at most \(v\) if and only if the sequence has \(< i\) elements greater than \(v\).
We can now use a similar counting argument as before. Let \(g\) be the number of elements greater than \(v\), so that \(g < i\), and we can again write \[\mathit{count}_{\le v}(i) = \sum_{g=0}^{i-1} c(g, v)\] where now, \(c(g, v)\) denotes the number of sequences with exactly \(g\) elements greater than \(v\). Then we can count \(c(g, v)\) similarly as before, except it’s even simpler:
The main change from Subtask 2 to Subtask 3 is that \(w\) is vastly increased, which means the portion of our previous algorithm that takes \(\approx wk\) steps is now unacceptable. Let's recap what those steps are:
+The main change from Subtask 2 to Subtask 3 is that \(w\) is vastly increased, which means the portion of our previous algorithm that takes \(\approx wk\) steps is now unacceptable. Let’s recap what those steps are:
Among these, the second one clearly dominates the running time. But we can essentially get rid of the second one by simply not precomputing powers, and instead just fast exponentiation to compute them when needed! This makes the running time slightly worse—fast exponentiation takes \(\mathcal{O}(\lg w)\) steps for an exponent the size of \(w\)—but that's a very worthwhile tradeoff, because you can check that the number of steps improves from \(\mathcal{O}(wk + n^2 k)\) to \[\mathcal{O}(w + n^2 k + nk \lg w).\] This is now acceptable for Subtask 3 🙂.
-Now, there's still that factor \(w\) in the running time, which in the current subtask is probably ok since \(w = 10^8\). However, in later subtasks, \(w = 10^{16}\), which suggests that that bit can still be improved further.
-How can we improve it? Well, the main reason for needing factorials up to \(w\) is so that we can compute binomial coefficients. But looking closer, notice that we actually only need binomial coefficients at exactly row \(w\). Furthermore, we actually only need the first \(n\) coefficients in it. And as it turns out, there's a way to compute a row of binomial coefficients one by one, starting from the leftmost one, by using the following recurrence (which is easy to prove using the factorial formula): \[\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},\] with base case simply \(\binom{w}{0} = 1\). So now, instead of precomputing factorials, we may simply precompute the needed binomial coefficients using this recurrence with just \(\approx n\) steps! The running time then improves to \[\mathcal{O}(n^2 k + nk \lg w),\] which is really cool.
+Among these, the second one clearly dominates the running time. But we can essentially get rid of the second one by simply not precomputing powers, and instead just fast exponentiation to compute them when needed! This makes the running time slightly worse—fast exponentiation takes \(\mathcal{O}(\lg w)\) steps for an exponent the size of \(w\)—but that’s a very worthwhile tradeoff, because you can check that the number of steps improves from \(\mathcal{O}(wk + n^2 k)\) to \[\mathcal{O}(w + n^2 k + nk \lg w).\] This is now acceptable for Subtask 3 🙂.
+Now, there’s still that term \(w\) in the running time, which in the current subtask is probably ok since \(w = 10^8\). However, in later subtasks, \(w = 10^{16}\), which suggests that that bit can still be improved further.
+How can we improve it? Well, the main reason for needing factorials up to \(w\) is so that we can compute binomial coefficients. But looking closer, notice that we actually only need binomial coefficients at exactly row \(w\). Furthermore, we actually only need the first \(n\) coefficients in it. And as it turns out, there’s a way to compute a row of binomial coefficients one by one, starting from the leftmost one, by using the following recurrence (which is easy to prove using the factorial formula): \[\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},\] with base case simply \(\binom{w}{0} = 1\). So now, instead of precomputing factorials, we may simply precompute the needed binomial coefficients using this recurrence with just \(\approx n\) steps! The running time then improves to \[\mathcal{O}(n^2 k + nk \lg w),\] which is really cool.
Subtasks 4 & 5
Our previous algorithm is now too slow; in particular, that \(\mathcal{O}(n^2 k)\) bit in the running time is now too large. For the rest of the subtasks, I'll just give a couple of hints to guide you towards faster solutions.
+Our previous algorithm is now too slow; in particular, that \(\mathcal{O}(n^2 k)\) bit in the running time is now too large. For the rest of the subtasks, I’ll just give a couple of hints to guide you towards faster solutions.
Hint 1
Do you really have to compute the whole sum \[\mathit{count}_{\le v}(i) = \sum_{g=0}^{i-1} c(g, v)\] every time?
Hint 2
Notice that \[(k - v)^g\cdot v^{w - g} = v^w \cdot \left(\frac{k - v}{v}\right)^g.\] Letting \(x_v := \frac{k - v}{v}\), this is the same as \(v^w x_v^g\).
The first one is simple enough, and you should be able to prove it yourself 🙂. The real surprise is the second, which holds even if \(X_1\) and \(X_2\) are not independent. (For independent variables, this may not be a surprise, since “clearly” the variables have nothing to do with each other,2 so the averages should “just add up.”)
-Let's see an example of this, using our current problem itself, with \(n = 2\), \(w = 3\) and \(k = 2\). In this case, we have \[T = T_1 + T_2\] where \(T_i\) is the value of the \(i\)th largest element. Clearly, \(T_1\) and \(T_2\) are not independent; for example, we know that \(T_1\) is at least \(T_2\), so if \(T_2\) is \(2\), then \(T_1\) must be \(2\) as well.
+Let’s see an example of this, using our current problem itself, with \(n = 2\), \(w = 3\) and \(k = 2\). In this case, we have \[T = T_1 + T_2\] where \(T_i\) is the value of the \(i\)th largest element. Clearly, \(T_1\) and \(T_2\) are not independent; for example, we know that \(T_1\) is at least \(T_2\), so if \(T_2\) is \(2\), then \(T_1\) must be \(2\) as well.
Regardless, we will now illustrate that \[\operatorname{E}[T] = \operatorname{E}[T_1] + \operatorname{E}[T_2]\] by simply enumerating all \(2^3 = 8\) possible sequences:
But actually, this little calculation illustrates pretty well why expectation is additive; we're simply adding the same things in different ways! To illustrate this further, we can tabulate everything as follows: \[\begin{array}{l|l|lll}
+ But actually, this little calculation illustrates pretty well why expectation is additive; we’re simply adding the same things in different ways! To illustrate this further, we can tabulate everything as follows: \[\begin{array}{l|l|lll}
s & p_s & T_1 & T_2 & T \\
\hline
[1, 1, 1] & \frac{1}{8} & 1 & 1 & 2 \\
@@ -326,8 +326,8 @@ It should now not be too hard to formalize this argument and make it more general. If you're interested, here it is: Computing \(p_sT\) column is still the sum of the \(p_sT_1\) and \(p_sT_2\) columns. Finally, computing \(\operatorname{E}[T]\) amounts to taking the sum of the \(p_sT\) column, while computing \(\operatorname{E}[T_1] + \operatorname{E}[T_2]\) amounts to taking the sum of the \(p_sT_1\) and \(p_sT_2\) columns separately, then adding them. But these are clearly the same! (And this worked even if \(T_1\) and \(T_2\) aren't independent.)
Proof
It should now not be too hard to formalize this argument and make it more general. If you’re interested, here it is: Proof
Suppose the sample space has \(k\) elements \(\{\omega_1, \omega_2, \ldots, \omega_k\}\) with respective probabilities \(p_1, p_2, \ldots, p_k\). Because \(T = T_1 + T_2\), we must always have \(T(\omega_i) = T_1(\omega_i) + T_2(\omega_i),\) for every \(i\).
Thus, by the law of the unconscious statistician, \[\begin{align*}
\operatorname{E}[T]
@@ -346,16 +346,16 @@ Computing
All solutions we described above compute the full answer, i.e., we pretend we were working on \(\mathbb{R}\) (or maybe \(\mathbb{C}\)) where we can add, subtract, multiply, and crucially, divide, numbers. Actually, we could also pretend we are working on \(\mathbb{Q}\), i.e., the rationals, since all intermediate results are clearly rational, and we can also do the same arithmetic operations there.
-Now, in many problems, we can usually convert such full-answer solutions into solutions that compute the answer mod \(m\), say \(m = 998244353\), because we can also add, subtract and multiply numbers mod \(m\). However, division mod \(m\) is more complicated; it sometimes doesn't work at all. To see this, let \(m = 10\), and note that \(12 \equiv 32 \pmod{10}\), but dividing by \(4\) fails: \[\frac{12}{4} = 3 \not\equiv 8 = \frac{32}{4} \pmod{10}.\]
+Now, in many problems, we can usually convert such full-answer solutions into solutions that compute the answer mod \(m\), say \(m = 998244353\), because we can also add, subtract and multiply numbers mod \(m\). However, division mod \(m\) is more complicated; it sometimes doesn’t work at all. To see this, let \(m = 10\), and note that \(12 \equiv 32 \pmod{10}\), but dividing by \(4\) fails: \[\frac{12}{4} = 3 \not\equiv 8 = \frac{32}{4} \pmod{10}.\]
Before we tackle this issue, let's first see if we can compute \(a/b \bmod m\) based solely on the definition given in the problem statement. Suppose you've computed the full answer as \(a/b\), and let's say it's in lowest terms. Then the problem guarantees us that \(a/b \bmod m\) is well-defined, and it is the unique number \(q\) such that “\(a/b - q = \frac{a - qb}{b}\) is divisible by \(m\)”, which by definition means that \(\frac{a - qb}{b}\) can be written as a fraction whose numerator is divisible by \(m\) but whose denominator is not. Now, the fraction \(\frac{a - qb}{b}\) is already in lowest terms (why?), so this means two things:
+Before we tackle this issue, let’s first see if we can compute \(a/b \bmod m\) based solely on the definition given in the problem statement. Suppose you’ve computed the full answer as \(a/b\), and let’s say it’s in lowest terms. Then the problem guarantees us that \(a/b \bmod m\) is well-defined, and it is the unique number \(q\) such that “\(a/b - q = \frac{a - qb}{b}\) is divisible by \(m\)”, which by definition means that \(\frac{a - qb}{b}\) can be written as a fraction whose numerator is divisible by \(m\) but whose denominator is not. Now, the fraction \(\frac{a - qb}{b}\) is already in lowest terms (why?), so this means two things:
All in all, this takes \(\approx m\) steps in the worst case to find \(q\), which is the answer we're looking for. With \(m = 998244353 \approx 10^9\), that isn't so bad, especially if \(a/b\) doesn't have too many digits. So for Subtasks 1 and 2, that's more-or-less okay. But for the larger subtasks the numbers become too large3 which makes it not okay, and we clearly need to do something else.
+All in all, this takes \(\approx m\) steps in the worst case to find \(q\), which is the answer we’re looking for. With \(m = 998244353 \approx 10^9\), that isn’t so bad, especially if \(a/b\) doesn’t have too many digits. So for Subtasks 1 and 2, that’s more-or-less okay. But for the larger subtasks, the numbers become too large3 which makes it not okay, and we clearly need to do something else.
You might suspect that the reason that dividing by \(4\) failed modulo \(10\) is that \(4\) and \(10\) share a common factor. And indeed, that's a good hunch. For example, dividing by \(3\) seems to work modulo \(10\), which you can check with lots of small examples, or maybe by using a program to do several checks for you, e.g.: Code (Python)
You might suspect that the reason that dividing by \(4\) failed modulo \(10\) is that \(4\) and \(10\) share a common factor. And indeed, that’s a good hunch. For example, dividing by \(3\) seems to work modulo \(10\), which you can check with lots of small examples, or maybe by using a program to do several checks for you, e.g.: Code (Python)
from math import gcd
def congruent(m, a, b):
@@ -380,7 +380,7 @@ Working “modulo \(m\)assert congruent(m, num1 // den, num2 // den)
print("All OK")
You can replace m = 10
with other numbers and it still seems to work! So clearly, there seems to be some sense in which division “kinda makes sense”, as long as the number you're dividing with is coprime with the modulus \(m\).
You can replace m = 10
with other numbers and it still seems to work! So clearly, there seems to be some sense in which division “kinda makes sense”, as long as the number you’re dividing with is coprime with the modulus \(m\).
And as it turns out, we can prove that fact!
Theorem A: If \(da \equiv db \pmod{m}\) and \(\gcd(m, d) = 1\), then \(a \equiv b \pmod{m}\).
@@ -388,60 +388,65 @@Proof: Fairly straightforward, so we leave it to the reader.
Now that's all well and good, but what we really want is to be able to divide modulo \(m\). For this, we should answer the following question first: what is division, really? Well, dividing is the same as multiplying by the multiplicative inverse, that is, \(a/b\) is the same as \(ab^{-1}\), where \(b^{-1} = 1/b\) is the multiplicative inverse of \(b\). But what is a multiplicative inverse? Well, \(b^{-1}\) is defined as the unique number such that \(bb^{-1} = 1\).
-Now, as it turns out, multiplicative inverses sometimes exist modulo \(m\). In the mod \(m\) world, the multiplicative inverse of \(b\) is still denoted \(b^{-1}\), but this time, it's not a fraction. Nonetheless, it's still defined analogously; \(b^{-1}\) is the “unique” number such that \[bb^{-1} \equiv 1 \pmod{m}.\] Note that I put “unique” in quotes because if \(x\) is a multiplicative inverse, then \(x + m\) is also one, as is \(x + 2m\), \(x - m\), etc. But as it turns out, all these numbers are the same mod \(m\), which is what we mean by “unique“ here.
-We can actually prove that fact, and in fact, something stronger: +Now Theorem A’s nice and all, but it only allows us to divide by \(d\) if the number was already divisible divisible by \(d\). What we really want is to be able to divide modulo \(m\) anytime we want, that is, makes sense of things like \[7/3 \bmod 10.\] In this particular example, even though \(7/3\) is not an integer, note that \(7 \equiv 27 \pmod{10}\) and \(27/3 = 9\) is an integer, so if Theorem A were to extend even to non-integer settings, then we ought to have \(7/3 \equiv 27/3 \pmod{10}\), so \(7/3 \bmod 10\) ought to be \(9\). We’d like to generalize this reasoning.
+For this, we should answer the following question first: what is division, really? Well, dividing is the same as multiplying by the multiplicative inverse, that is, \(a/b\) is the same as \(ab^{-1}\), where \(b^{-1} = 1/b\) is the multiplicative inverse of \(b\). But what is a multiplicative inverse? Well, \(b^{-1}\) is defined as the unique number such that \(bb^{-1} = 1\).
+Now, as it turns out, multiplicative inverses sometimes exist modulo \(m\). In the mod \(m\) world, the multiplicative inverse of \(b\) is still denoted \(b^{-1}\), but this time, it’s not a fraction. Nonetheless, it’s still defined analogously; \(b^{-1}\) is the “unique” number such that \[bb^{-1} \equiv 1 \pmod{m}.\] Note that I put “unique” in quotes because if \(x\) is a multiplicative inverse, then \(x + m\) is also one, as is \(x + 2m\), \(x - m\), etc. But as it turns out, all these numbers are the same mod \(m\), which is what we mean by “unique” here.
+We can actually prove that fact, and in fact, something stronger; we can say precisely when there’s a multiplicative inverse:Theorem B: \(b\) has a multiplicative inverse if and only if \(b\) and \(m\) and coprime, and it is unique if it exists.
+Theorem B: For any \(b \in \mathbb{Z}\), \(b\) has a multiplicative inverse if and only if \(b\) and \(m\) and coprime, and it is unique (mod \(m\)) if it exists.
Proof
(⇒) Suppose \(b\) has a multiplicative inverse \(b'\), so that \[bb' \equiv 1 \pmod{m}.\] This is equivalent to saying that there's a \(k\) such that \[bb' - mk = 1.\] Now, if \(d\) is a common divisor of \(b\) and \(m\), then \(d\) divides the left-hand side, so it must also divide the right-hand side, which is \(1\). Thus, all common divisors of \(b\) and \(m\) divide \(1\), which means they are coprime.
-(⇐) Suppose \(b\) and \(m\) are coprime, so their gcd is \(1\). By Bézout's, there are integers \(x\) and \(y\) such that \[bx + my = 1.\] Reducing this modulo \(m\) gives \[bx \equiv 1 \pmod{m},\] so \(x\) is a multiplicative inverse of \(b\).
+(⇒) Suppose \(b\) has a multiplicative inverse \(b'\), so that \[bb' \equiv 1 \pmod{m}.\] This is equivalent to saying that there’s a \(k\) such that \[bb' - mk = 1.\] Now, if \(d\) is a common divisor of \(b\) and \(m\), then \(d\) divides the left-hand side, so it must also divide the right-hand side, which is \(1\). Thus, all common divisors of \(b\) and \(m\) divide \(1\), which means they are coprime.
+(⇐) Suppose \(b\) and \(m\) are coprime, so their gcd is \(1\). By Bézout’s, there are integers \(x\) and \(y\) such that \[bx + my = 1.\] Reducing this modulo \(m\) gives \[bx \equiv 1 \pmod{m},\] so \(x\) is a multiplicative inverse of \(b\).
(Uniqueness) Suppose \(b'\) and \(b''\) are both multiplicative inverses of \(b\). Then \[\begin{align*} bb' &\equiv 1 \pmod{m} \\ bb'' &\equiv 1 \pmod{m}, \end{align*}\] so \[bb' \equiv bb'' \pmod{m}.\] But \(m\) and \(b\) are coprime (since a multiplicative inverse exists), so by using Theorem A, \(b' \equiv b'' \pmod{m}\), so any two multiplicative inverses of \(b\) are the same mod \(m\).
Remark: The proof can actually be turned into an algorithm to compute the multiplicative inverse, since the integers \(x\) and \(y\) guaranteed by Bézout's identity can be computed using the extended version of Euclid's gcd algorithm.
+Remark: The proof can actually be turned into an algorithm to compute the multiplicative inverse, since the integers \(x\) and \(y\) guaranteed by Bézout’s identity can be computed using the extended version of Euclid’s gcd algorithm.
So with this, we're now fairly able to “divide modulo \(m\)”, as long as the divisors are coprime with \(m\). Since we're using the modulus \(m = 998244353\) which is prime, most numbers are coprime! The only ones we can't divide with are those divisible by \(m\) itself, but since such numbers are \(\equiv 0 \pmod{m}\), it makes sense not to be able to divide with them since that's sort of equivalent to dividing by \(0\).
-Now, that's well and good, but we still need to relate this way of dividing modulo \(m\) with the definition given in the statement. As it turns out, everything is okay; we can prove that \(a/b \bmod m\), as defined in the statement, is the same as \(ab^{-1} \bmod m\), using the following theorem:
+So with this, we’re now fairly able to “divide modulo \(m\)”, as long as the divisors are coprime with \(m\). Since we’re using the modulus \(m = 998244353\) which is prime, most numbers are coprime! The only ones we can’t divide with are those divisible by \(m\) itself, but since such numbers are \(\equiv 0 \pmod{m}\), it makes sense not to be able to divide with them since that’s sort of equivalent to dividing by \(0\).
+Now, that’s well and good, but we still need to relate this way of dividing modulo \(m\) with the definition given in the statement. As it turns out, everything is okay; we can prove that \(a/b \bmod m\), as defined in the statement, is the same as \(ab^{-1} \bmod m\), using the following theorem:
Theorem C: For a rational \(r\), \(r \bmod m\) exists if and only if \(r\) can be written as \(a/b\) with \(b\) coprime with \(m\), and if it exists, then we have the equality \[(a/b \bmod m) = (ab^{-1} \bmod m).\]
For this theorem to work, we will amend the definition given in the statement as follows: We say a rational is divisible by \(m\) if it can be written as \(a/b\) with \(a\) divisible by \(m\) and \(b\) coprime with \(m\). This is equivalent to the definition in the statement when \(m\) is prime, but it's friendlier to nonprime moduli.
+For this theorem to work, we will amend the definition given in the statement as follows: We say a rational is divisible by \(m\) if it can be written as \(a/b\) with \(a\) divisible by \(m\) and \(b\) coprime with \(m\). This is equivalent to the definition in the statement when \(m\) is prime, but it’s friendlier to nonprime moduli.
Proof
(⇒) Suppose \(r \bmod m\) exists, i.e., there's a unique \(q\) such that \(r - q\) is “divisible by \(m\)” (as defined above). Writing \(r\) in lowest terms as \(a/b\), we note that \(a/b - q = \frac{a - bq}{b}\) is also in lowest terms.
+(⇒) Suppose \(r \bmod m\) exists, i.e., there’s a unique \(q\) such that \(r - q\) is “divisible by \(m\)” (as defined above). Writing \(r\) in lowest terms as \(a/b\), we note that \(a/b - q = \frac{a - bq}{b}\) is also in lowest terms.
By definition of divisibility, \(\frac{a - qb}{b}\) can be written as \(a'/b'\) with \(m\) dividing \(a'\) but coprime with \(b'\). Since \[\frac{a - qb}{b} = \frac{a'}{b'}\] and the former is in lowest terms, it follows that \(a - qb\) is a divisor of \(a'\) and \(b\) is a divisor of \(b'\). But if \(m\) and \(b'\) are coprime and \(b \mid b'\), then \(m\) and \(b\) must be coprime as well.
-(⇐) Suppose \(r = a/b\) with \(b\) is coprime with \(m\). Then I claim that \[q := (ab^{-1} \bmod m)\] satisfies the definition of \(r \bmod m\). Note that \[r - q = \frac{a - qb}{b},\] and we already know \(b\) is coprime with \(m\), so it's sufficient to show that \(a - qb\) is divisible by \(m\), i.e., \(a \equiv qb \pmod{m}\). That's shown as follows: \[\begin{align*}
+ (⇐) Suppose \(r = a/b\) with \(b\) is coprime with \(m\). Then I claim that \[q := (ab^{-1} \bmod m)\] satisfies the definition of \(r \bmod m\). Note that \[r - q = \frac{a - qb}{b},\] and we already know \(b\) is coprime with \(m\), so it’s sufficient to show that \(a - qb\) is divisible by \(m\), i.e., \(a \equiv qb \pmod{m}\). That’s shown as follows: \[\begin{align*}
qb
&\equiv (ab^{-1})b \\
&\equiv a(bb^{-1}) \\
&\equiv a\cdot(1) \\
&= a \pmod{m}.
\end{align*}\] So \(r - q\) is indeed divisible by \(m\). All that remains is to show that \(q\) is the unique one satisfying the definition. If \(q'\) also satisfies the definition, then \(\frac{a - q'b}{b}\) is also divisible by \(m\), so we can write it as \[\frac{a - q'b}{b} = \frac{a'}{b'}\] with \(m\) dividing \(a'\) and coprime with \(b'\). Rearranging this gives \[(a - q'b)b' = a'b.\] Because \(m \mid a'\), \(m\) must divide the left-hand side \((a - q'b)b'\) as well, but since \(m\) and \(b'\) are coprime, \(m\) must divide \(a - q'b\), i.e., \[a \equiv q'b \pmod{m}.\] Multiplying both sides by \(b^{-1}\), we get \[q' \equiv ab^{-1} \equiv q \pmod{m}.\] In other words, any other possible value \(q'\) of \((r \bmod m)\) must be equal to \(q = (ab^{-1} \bmod m)\), so it's unique. So \(r - q\) is indeed divisible by \(m\). All that remains is to show that \(q\) is the unique one satisfying the definition. If \(q'\) also satisfies the definition, then \(\frac{a - q'b}{b}\) is also divisible by \(m\), so we can write it as \[\frac{a - q'b}{b} = \frac{a'}{b'}\] with \(m\) dividing \(a'\) and coprime with \(b'\). Rearranging this gives \[(a - q'b)b' = a'b.\] Because \(m \mid a'\), reducing this modulo \(m\) gives \[\begin{align*}
+(a - q'b)b' &\equiv 0 \pmod{m} \\
+a - q'b &\equiv 0 && \text{using Theorem A} \\
+a &\equiv q'b
+\end{align*}\] Multiplying both sides by \(b^{-1}\), we get \[q' \equiv ab^{-1} \equiv q \pmod{m}.\] In other words, any other possible value \(q'\) of \((r \bmod m)\) must be equal to \(q = (ab^{-1} \bmod m)\), so it’s unique. Corollary: Suppose \(r\) cannot be written as \(a/b\) with \(b\) coprime with \(m\). Then there is no integer \(q\) such that \(r - q\) is divisible by \(m\). Note that this doesn't follow immediately from the definition, since if \(r \bmod m\) doesn't exist, then all we can say from the definition is that there isn't exactly one \(q\) such that \(r - q\) is divisible by \(m\). In particular, there may be zero, or there may be more than one. This corollary rules out the latter. Note that this doesn’t follow immediately from the definition, since if \(r \bmod m\) doesn’t exist, then all we can say from the definition is that there isn’t exactly one \(q\) such that \(r - q\) is divisible by \(m\). In particular, there may be zero, or there may be more than one. This corollary rules out the latter. We prove the contrapositive. Suppose there is a \(q\) such that \(r - q\) is divisible by \(m\). Notice that the “(⇒)” portion of the previous proof doesn't really use the fact that \(q\) is unique, so the proof also goes through here just fine, and it proves that \(r\) can be written as \(a/b\) with \(b\) coprime with \(m\). With this, we can now completely work modulo \(m = 998244353\) all throughout! All that we need now is to check that we're only ever dividing with numbers without \(m\) as a prime factor. The possible divisors come from \(k^w\) and the numbers coming from the computation of \(\binom{w}{g}\) with \(g < n\). The number \(k\) is less than \(m\) in all inputs, so \(k^w\) is coprime with \(m\). And in the first few subtasks, \(w\) is also less than \(m\), so all factors in \(\binom{w}{g}\) are all coprime as well. Finally, in the subtasks where \(w\) is very large, recall that we're only computing the first \(n\) terms of row \(w\) of the binomial coefficient table, and that we're using the recurrence \[\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},\] so we only need to divide with numbers \(g < n\). Since \(n < m\) for all inputs, this is ok too. Suppose there is a \(q\) such that \(r - q\) is divisible by \(m\). Notice that the “(⇒)” portion of the previous proof doesn’t really use the fact that \(q\) is unique, so the proof also goes through here just fine, and it proves that \(r\) can be written as \(a/b\) with \(b\) coprime with \(m\). With this, we can now completely work modulo \(m = 998244353\) all throughout! All that we need now is to check that we’re only ever dividing with numbers without \(m\) as a prime factor. The possible divisors come from \(k^w\) and the numbers coming from the computation of \(\binom{w}{g}\) with \(g < n\). The number \(k\) is less than \(m\) in all inputs, so \(k^w\) is coprime with \(m\). And in the first few subtasks, \(w\) is also less than \(m\), so all factors in \(\binom{w}{g}\) are coprime with \(m\) as well. Finally, in the subtasks where \(w\) is very large, recall that we’re only computing the first \(n\) terms of row \(w\) of the binomial coefficient table, and that we’re using the recurrence \[\binom{w}{g} = \binom{w}{g - 1}\cdot \frac{w - g + 1}{g},\] so we only need to divide with numbers \(g < n\). Since \(n < m\) for all inputs, this is ok too. Thus, we can safely divide whenever we need to, and all is well in the world. Remark: In math, when we're doing this idea of “working modulo \(m\)” we usually say we're “working in \(\mathbb{Z}/m\mathbb{Z}\)”. Here, “\(\mathbb{Z}/m\mathbb{Z}\)” is a formalization of the “set of integers modulo \(m\)”. It is just like the integers \(\mathbb{Z}\), but we make two numbers equal iff they are the same mod \(m\). In this setting, we can also add, subtract, and multiply, and we can divide by any number coprime with \(m\) (as shown above). If \(m\) is prime, then this means we can divide by any “nonzero number” (where you need to remember that “nonzero” means “not divisible by \(m\)”), which makes \(\mathbb{Z}/m\mathbb{Z}\) a field, just like \(\mathbb{R}\), \(\mathbb{C}\), and \(\mathbb{Q}\). Remark: In math, when we’re doing this idea of “working modulo \(m\)” we usually say we’re “working in \(\mathbb{Z}/m\mathbb{Z}\)”. Here, “\(\mathbb{Z}/m\mathbb{Z}\)” is a formalization of the “set of integers modulo \(m\)”. It is just like the integers \(\mathbb{Z}\), but we make two numbers equal iff they are the same mod \(m\). In this setting, we can also add, subtract, and multiply, and we can divide by any number coprime with \(m\) (as shown above). If \(m\) is prime, then this means we can divide by any “nonzero number” (where you need to remember that “nonzero” means “not divisible by \(m\)”), which makes \(\mathbb{Z}/m\mathbb{Z}\) behave very much like \(\mathbb{R}\), \(\mathbb{C}\), and \(\mathbb{Q}\) where arithmetic operations are all defined except only division by zero; we say it’s a field. technically, we should say “uniformly randomly, and independently of each other” here...↩ technically, independent doesn't really mean has nothing to do with each other; it means more like the probabilities of one are not affected if you know the other↩ and even if it isn't, the overhead of having to compute with large numbers makes things slower, which we probably couldn't afford for later subtasks↩ technically, independent doesn’t really mean has nothing to do with each other; it means more like the probabilities of one are not affected if you know the other↩ and even if it isn’t, the overhead of having to compute with large numbers makes things slower, which we probably couldn’t afford for later subtasks↩Proof
We can check that this is correct by running it on one of the examples, say \(n = 3\) and \(b = 18\).
-Unfortunately, when you try to pass in the actual input \(n = 10\) and \(b = 48\), you’ll find that it doesn’t seem to finish. Indeed, there are \(46\) possible values, which means there are \(46^{10} \approx 4\cdot 10^{16}\) possible sequences. Even if we could process \(10^9\) sequences per second, this program will take more than one year to finish!
+Unfortunately, when you try to pass in the actual input \(n = 10\) and \(b = 48\), you’ll find that it doesn’t seem to finish. Indeed, there are \(46\) possible values, which means there are \(46^{10} \approx 4\cdot 10^{16}\) possible sequences. Even if we could process \(10^9\) sequences per second, this program will take more than one year to finish!
We can improve this slightly with some observations.
First, the numbers must be distinct, so we could just enumerate all sequences without repeated values. This reduces the number of candidates from \(46^{10}\) to \(46\cdot 45\cdot 44 \cdots 37\). However, this number is still large—it’s \(\approx 1.5\cdot 10^{16}\), which isn’t a huge improvement. With \(10^9\) sequences per second, our program would still take several months.
Another insight would be to notice that for every set of \(n\) distinct numbers, there is at most one ordering of them that could potentially work, because we want their largest (or smallest) prime factors to be increasing as well. So for every set of \(n\) distinct numbers, we can simply sort them by their largest prime factor, and check if that ordering works. This reduces the number of candidates further to \(\binom{46}{10} \approx 4\cdot 10^9\), which is much smaller than before, and the program may now be waitable.
First, the numbers must be distinct, so we could just try to enumerate sequences without repeated values. This reduces the number of candidates from \(46^{10}\) to \(46\cdot 45\cdot 44 \cdots 37\). However, this number is still large—it’s \(\approx 1.5\cdot 10^{16}\), which isn’t a huge improvement. With \(10^9\) sequences per second, our program would still take several months.
Another insight would be to notice that for every set of \(n\) distinct numbers, there is at most one ordering of them that could potentially work, because we want their largest (or smallest) prime factors to be increasing as well. So for every set of \(n\) distinct numbers, we can simply sort them by their largest prime factor, and check if that ordering works. This reduces the number of candidates further to \(\binom{46}{10} \approx 4\cdot 10^9\), which is much smaller than before, and the program may now be waitable.
However, we can do even better than this. We could attempt to build the sequence number by number, and stop the construction as soon as one of the conditions already fails.
Specifically, the goal is to construct the sequence \([a_1, a_2, \ldots, a_n]\) number by number. At every point in the construction, we’re attempting to choose the value of some \(a_i\) between \(2\) and \(b-1\). We could just try each of them in turn, but we could do better: We know that \(a_i\)’s smallest and largest prime factors must be larger than those of \(a_{i-1}\)’s, so it’s enough to only try the values with that property.
After successfully choosing \(n\) such numbers this way, we’re guaranteed that the sequence we produced is valid (since we already checked all the necessary conditions), so the running time of this solution is now basically proportional to the number of sequences itself!1 So we simply hope that there aren’t too many of them that the program will finish quickly. And sure enough, if you implement and run this with \(n = 10\) and \(b = 48\), we find that it finishes in just a few seconds, even in Python!
Using this recurrence, we can now build a table of values of \(S(n', i)\), for all \((n', i)\) such that \(1 \le n' \le n\) and \(1 < i < b\). We can build this table in increasing order of \(n'\), because each entry \(S(n', i)\) only depends on the “previous layer” (because the summands are \(S(n' - 1, j)\)), whose values we’ve already computed. Finally, once we fill in the \(n\)th layer, we could then compute the answer using our summation formula above.
What’s the running time of this solution? Well, there are \(\approx nb\) possible arguments \((n', i)\), and each one is computed with a summation with \(\approx b\) terms, so the amount of work is roughly \(\approx nb\cdot b = nb^2\). (In algorithm parlance, we say that the running time is “\(\mathcal{O}(nb^2)\).”) The amount of steps needed is small enough that this algorithm can be used to solve Subtask 1 by hand (or maybe with a spreadsheet). For Subtask 2, this is already quite waitable, but we can slightly speed it up by noticing that \(S(n, i)\) doesn’t really depend on \(i\), only on \((x_i, y_i)\), so such values are equal for multiple points that happen to coincide. Formally, if \((x_i, y_i) = (x_j, y_j)\), then \(S(n, i) = S(n, j)\). Using this, we only need to compute it once for every distinct point in \(\{(x_i, y_i) \mid 1 < i < b \}\). This speeds up the running time from \(\approx nb\cdot b\) steps to \(\approx np\cdot b\) steps, where \(p\) is the number of distinct points. (For \(b = 4000\), you could check that \(p = 1637\).)
This technique of building a table of results whose elements depend on earlier entries is called dynamic programming, or DP.
+Exercise: There’s another slight tweak that can be done to improve this from \(\approx np\cdot b\) steps to \(\approx np\cdot p\) steps. Explain how to do it.
+Subtasks 3 & 4
For the remaining subtasks, I’ll only give hints. The previous solution is now too slow, so we need something faster. I’ll give you a few hints that you can use to speed up your solution in different ways. A combination of some of them (plus maybe a few other insights) can be used to solve the remaining subtasks.
diff --git a/2023/tama/primes/index.md b/2023/tama/primes/index.md index bf4623c..84a8468 100644 --- a/2023/tama/primes/index.md +++ b/2023/tama/primes/index.md @@ -77,13 +77,13 @@ def solve(n, b): ``` We can check that this is correct by running it on one of the examples, say $n = 3$ and $b = 18$. -Unfortunately, when you try to pass in the actual input $n = 10$ and $b = 48$, you’ll find that it doesn’t seem to finish. Indeed, there are $46$ possible values, which means there are $46^{10} \approx 4\cdot 10^{16}$ possible sequences. Even if we could process $10^9$ sequences per second, this program will take more than one year to finish! +Unfortunately, when you try to pass in the actual input $n = 10$ and $b = 48$, you’ll find that it doesn’t seem to finish. Indeed, there are $46$ possible values, which means there are $46^{10} \approx 4\cdot 10^{16}$ possible sequences. Even if we could process $10^9$ sequences per second, this program will take *more than one year* to finish! We can improve this slightly with some observations. -- First, the numbers must be *distinct*, so we could just enumerate all sequences **without repeated values**. This reduces the number of candidates from $46^{10}$ to $46\cdot 45\cdot 44 \cdots 37$. However, this number is still large—it’s $\approx 1.5\cdot 10^{16}$, which isn’t a huge improvement. With $10^9$ sequences per second, our program would still take several months. +- First, the numbers must be *distinct*, so we could just try to enumerate sequences **without repeated values**. This reduces the number of candidates from $46^{10}$ to $46\cdot 45\cdot 44 \cdots 37$. However, this number is still large—it’s $\approx 1.5\cdot 10^{16}$, which isn’t a huge improvement. With $10^9$ sequences per second, our program would still take several months. -- Another insight would be to notice that for every set of $n$ distinct numbers, there is at most one ordering of them that could potentially work, because we want their largest (or smallest) prime factors to be increasing as well. So for every *set* of $n$ distinct numbers, we can simply **sort them by their largest prime factor**, and check if that ordering works. This reduces the number of candidates further to $\binom{46}{10} \approx 4\cdot 10^9$, which is much smaller than before, and the program may now be waitable. +- Another insight would be to notice that for every *set* of $n$ distinct numbers, there is at most one ordering of them that could potentially work, because we want their largest (or smallest) prime factors to be increasing as well. So for every *set* of $n$ distinct numbers, we can simply **sort them by their largest prime factor**, and check if that ordering works. This reduces the number of candidates further to $\binom{46}{10} \approx 4\cdot 10^9$, which is much smaller than before, and the program may now be waitable. - However, we can do even better than this. We could attempt to build the sequence number by number, and stop the construction **as soon as one of the conditions already fails**. @@ -197,6 +197,11 @@ What’s the running time of this solution? Well, there are $\approx nb$ pos This technique of building a table of results whose elements depend on earlier entries is called **dynamic programming**, or DP. +