lower-bounds.Rmd

# Lower bounds on query complexity of quantum algorithms {#chap-lowerbounds}

<div style="text-align: right"> Contributors: Trong Duong </div>
<br>


We have discussed about several important quantum algorithms. A large portion of them is query-based, i.e. the input is given as a quantum black box $O: |\mathbf{x}\rangle |y\rangle \mapsto |\mathbf{x}\rangle |y \oplus f(x)\rangle$. In that setting, query complexity plays a key role in determining the advantage and limitation of an algorithm. This chapter will discuss how to obtain a lower bound on query complexity, i.e. the minimal number of queries an algorithm needs to make for a desired output (probably up to some error bound). In particular, we are going to look into __Polynomial method__ and __Adversary method__, along with some simple applications.


## Polynomial method

The goal of a quantum query algorithm usually involves determining entirely or partially a function $f:\{0,1\}^n \rightarrow \{0,1\}$ given via an oracle $O_f: O_f|\mathbf{x},a\rangle = |\mathbf{x},a\oplus y\rangle$, where $y \in \{0,1\}$ is the evaluation of $f$ at the input $\mathbf{x} \in \{0,1\}^n$. The oracle basically swaps $|\mathbf{x},0\rangle$ and $|\mathbf{x},1\rangle$ when $y = 1$, and does nothing when $y=0$. For a general superposition of $|\psi\rangle = \sum_{i \in [N]}\alpha_{i,0}(\mathbf{x})|i,0\rangle + \alpha_{i,1}(\mathbf{x})|i,1\rangle$, where $\alpha_{i,a}(\mathbf{x})$ is an $N$-variate polynomial in terms of $x_i$'s, the degree of $\alpha_{i,a}(\mathbf{x})$ is increased by $O_f$ by at most $1$.

$$O_f |\psi\rangle = \sum_{i\in [N]} \left[ (1-y_i) \alpha_{i,0}(\mathbf{x})  + y_i \alpha_{\mathbf{x},1}(\mathbf{x}) \right]|i,0\rangle + \left[ y_i \alpha_{i,0}(\mathbf{x}) + (1-y_i) \alpha_{i,1} (\mathbf{x}) \right] |i,1\rangle$$
A quantum algorithm typically calls an interleaved chain of unitary operators and the oracles. Those unitary operators do not depend of $\mathbf{x}$, so $O_f$'s are the only contributors to any increase in the degree of $\alpha_{i,a}(\mathbf{x})$, with an increase of at most $1$ for each call to the oracle.

The following lemmas come as consequences of the preceeding observation and the fact that $y_i^2 = y_i$.

```{lemma, poly-amplitude, name="Degree of amplitude polynomials"}
Suppose a quantum query algorithm makes $T$ call to $O_f$. Then the amplitude of every basis state is a multilinear polynomial in terms of $x_i$'s of degree at most $T$.
```


In a decision problem, one define acceptance rate $p(x)$ as the probability of obtaining the ancilla in state $1$ at the final measurement. In particular. $p(x) = \sum_{i \in [N]} |\alpha_{i,1}|^2$. By the previous lemma, one can bound the degree of the acceptance rate as a polynomial.

```{lemma, acceptance-poly, name="Degree of acceptance polynomials"}
Acceptance rate of the algorithm is a multilinear polynomial in $x_i$'s of degree at most $2T$.
```


For an error-bounded algorithm that computes $f$ with error at most $\varepsilon$, the acceptance polynomial $p(\mathbf{x})$ is called an __approximating polynomial__ of $f$ if

$$\forall \mathbf{x} \in \{0,1\}^n, |p(\mathbf{x}) - f(\mathbf{x})| \leq \varepsilon$$
As a result, if we can show a approximating polynomial of $f$ has degree at least $d$, then every quantum algorithm that computes $f$ with error at most $\varepsilon$ must make at least $d/2$ queries to the oracle. In practice we typically choose $\varepsilon = 1/3$.

To shed light on the use of polynomial method, we shall look into one of its particular applications when $f(\mathbf{x})$ is symmetric, i.e. $f$ only depends on the Hamming weight $|\mathbf{x}|$, which the number of bits $1$ in $\mathbf{x}$. The value of such $f$ is invariant under permutations of $(x_1,x_2,\dots,x_n)$. Suppose we have an approximating polynomial $p$ of $f$, consider the average $\bar{p}$ of $p$ over all input permutations $\pi(\mathbf{x})$.

$$
\bar{p}(\mathbf{x}) = \frac{1}{N!}\sum_{\pi \in S_n} p(x_{\pi(1)},x_{\pi(2)},\dots,x_{\pi(n)})
$$

A nice property of this symmetrized polynomial is that it is a uni-variate polynomial in $|x|$ that approximates $f$. Moreover, the uni-variate polynomial has a degree $d \leq \text{deg}(p)$. To see why this is the case, we observe that

$\bar{p}(\mathbf{x}) = c_0 + c_1Q_1 + c_2Q_2 + \cdots + c_dQ_d$, where $Q_1 = x_1 + x_2 + \cdots + x_n$, $Q_2 = x_1x_2 + \dots + x_1x_n + x_2x_1 + \dots + x_{n}x_{n-1}$, etc. Consider

$$Q_j = \sum_{{\substack{S\subset [N] \\|S| = j}}} \prod_{i \in S} x_i$$
If $j>|x|$, each product within the representation of $Q_j$ is equal to zero, so $Q_j = 0$. If $j \leq |x|$, there are exactly $\binom{|x|}{j}$ nonzero products within $Q_j$. Those nonzero products equal $1$, so $Q_j = \binom{|x|}{j}$. Therefore the symmetrized polynomial is
$$\bar{p}(\mathbf{x}) = c_0 + c_1\binom{|x|}{1} + \dots + c_d\binom{|x|}{d} = r(|x|)$$
Also, since $p$ approximates $f$ that only depends on $|\mathbf{x}|$, the sum over all input permutations $\bar{p}$ also approximates $f$. From all of the above, one has $\text{deg}(\bar{p}) \leq \text{deg}(p) \leq 2T$. The best lower bound for $T$ can be obtained if we find the smallest-degree $\bar{p}$. Sometimes it is not easy to access to the degree of the symmetrized polynomial without having the explicit form of the initial approximating polynomial. Rather, we can relax the lower bound for $T$ relying on the smallest-degree $p$. The following powerful theorem characterizes the degree of the smallest-degree approximating polynomial.


(ref:paturi1992degbound) [@paturi1992degbound]

```{theorem, paturi-theorem, name="Paturi theorem (ref:paturi1992degbound)"}
Suppose $f$ is a non-constant symmetric boolean function on \{0,1\}^n and $p$ is some approximating polynomial of $f$.

Let $f_k \equiv f(\mathbf{x})$ when $|\mathbf{x}| = k$, and $\Gamma(f) \equiv \min\{|2k-n+1| : f_k \neq f_{k+1}$ for $0\leq k \leq n-1\}$.

Then $\min{\text{deg}(p)} \in \Theta(\sqrt{n(n-\Gamma(f))})$
```


A simple example is the black-box PARITY function $f(\mathbf{x}) = x_1 \oplus x_2 \oplus \dots \oplus x_n$. Symmetrizing $f$ induces a zero-error polynomial $r(x)$ such that
$$r(k) = \left\{\begin{matrix}
1, \text{ } k \text{ is even}\\
0, \text{ } k \text{ is odd}
\end{matrix}\right.$$
Note that $r(x)$ changes direction at least $n$ times, so $\text{deg}(r) \geq n$. Note that, for PARITY function, we have $\Gamma(f) = 1$ and $\min{\text{deg}(p)} \in \Theta(n)$. So $r(x)$ offers the optimal lower bound. Thus the computation of $f$ requires at least $n/2$ queries. According to this, Deutsch's algorithm is the optimal algorithm even in no-error case for $n=2$.

We will look at another example of OR function. One way to learn the function from the corresponding oracle is using the Grover's algorithm. If the input has at least one $x_i = 1$, the algorithm can find some index $j$ such that $x_j=1$ with high probability using $\Theta(\sqrt{n})$ queries. As a result, one can compute the OR function with bounded error with $\Theta(\sqrt{n})$ queries. This result agrees with Paturi's theorem with $\Gamma(f) = n-1$ and $\min{\text{deg}(p)} \in \Theta(\sqrt{n})$. We can also prove the minimum number of query calls required independently: Computing the symmetric OR function with error $\leq 1/3$ induces a uni-variate approximating polynomial $r$ such that

$$r(0) \in [0,1/3] \quad \text{and} \quad r(k) \in [2/3,1] \: \forall k\in \{1,2,\dots,n\}$$

A bound for derivatives of a polynomial can be obtained as a function of its degree according to the Markov brothers' inequality, whose statement is given as follows:

(ref:markov1890ineq) [@markov1890ineq]

```{theorem, markov-brothers-ineq, name="Markov brothers' inequality (ref:markov1890ineq)"}
Given a polynomial $p$ of degree $d$ such that $|p(x)| \leq L$ for $x \in [a,b]$. Then within the domain, the first derivative of $p$ is bounded.
$$|p'(x)| \leq \frac{2d^2L}{b-a}$$
```

Applying the inequality to $r$, one have $|r'(x)| \leq \frac{2d^2}{n}$. On the other hand, $r'(x) \geq 1/3$ at some point between $0$ and $1$ by the Mean Value Theorem. Hence, $T \geq d/2 \geq \Omega(\sqrt{n})$ that gives the same lower bound for number of queries required as the previous method.

However, a similar quadratic speedup is not feasible for an exact algorithm for OR function. Assume such an algorithm exists, then it induces $r(0) = 0$ and $r(k) = 1, \: k\in \{1,2,\dots,n\}$. The univariate polynomial $r$ then has to change direction at least $n-1$ times to satisfy the constraint, making its degree at least $n$. Thus $T \geq n/2$, so the exact algorithm cannot achieve a quadratic speedup.


## Quantum adversary method

Consider a decision problem given by a function $f: \{0,1\}^n \rightarrow \{0,1\}$. An input $\mathbf{w} = (w_1,\dots,w_n) \in \{0,1\}^n$ to the function can be accessed via a phase oracle $O_{\mathbf{w}}: |i\rangle \mapsto (-1)^{w_i}|i\rangle$. This type of oracle can be obtained from the general oracle $O'_{\mathbf{w}}: |i\rangle |a\rangle \mapsto |i\rangle |a \oplus g(i)\rangle$ by setting the ancilla state $|a\rangle = |-\rangle$ and $g(i) = w_i$. This trick is often referred as _phase kickback_.

\begin{align}
\begin{matrix}
|i\rangle \left(\frac{|0\rangle - |1\rangle}{\sqrt{2}}\right) &\mapsto& |i\rangle \left(\frac{|w_i\rangle - |1\oplus w_i\rangle}{\sqrt{2}}\right) \\
 &=& \left\{\begin{matrix}
|i\rangle \left(\frac{|0\rangle - |1\rangle}{\sqrt{2}}\right), \quad  w_i=0\\
|i\rangle \left(\frac{|1\rangle - |0\rangle}{\sqrt{2}}\right), \quad w_i=1
\end{matrix}\right. \\
 &=& (-1)^{w_i} |i\rangle \left(\frac{|0\rangle - |1\rangle}{\sqrt{2}}\right)
\end{matrix}
\end{align}


Our goal is to determine $f(\mathbf{x})$ with high probability, say at least $2/3$, for a given $\mathbf{x}$ using as few $O_{\mathbf{x}}$ queries as possible. An error-bounded algorithm must accept any $\mathbf{x} \in f^{-1}(0)$ with probability $\leq 1/3$ and accept any $\mathbf{y} \in f^{-1}(1)$ with probability $\geq 2/3$.

For an arbitrary input $\mathbf{w}$, a general algorithm can be formulated as a sequence
$$U_T O_{\mathbf{w}} U_{T-1} O_{\mathbf{w}} \dots O_{\mathbf{w}} U_1 O_{\mathbf{w}} U_0$$
Denote $|\psi^t_{\mathbf{w}}\rangle$ be the state of the system after the $t$-th query to the oracle. Let $\mathbf{x} \in X \equiv \{\mathbf{x} | f(\mathbf{x}) =0 \}=f^{-1}(0)$ and $\mathbf{y} \in Y = f^{-1}(1)$. Define the _progress function_ of $\mathbf{x}$ and $\mathbf{y}$ as the inner product of their corresponding quantum state though the time period $\langle\psi^t_{\mathbf{x}} | \psi^t_{\mathbf{y}} \rangle$. Define the _progress measure_ over a subset $R$ of $X \times Y$  as

$$S(t) = \sum_{(x,y)\in R} |\langle\psi^t_{\mathbf{x}} | \psi^t_{\mathbf{y}} \rangle|$$

Observe that $\langle\psi^0_{\mathbf{x}} | \psi^0_{\mathbf{y}} \rangle = 1$ as the initial states for all inputs are necessarily the same. Also, non-oracle unitary operators $U_i$ do not alter $\langle\psi^t_{\mathbf{x}} | \psi^t_{\mathbf{y}} \rangle$. One can show $|\langle\psi^T_{\mathbf{x}} | \psi^T_{\mathbf{y}} \rangle| \leq \frac{17}{18}$ for an error-bounded algorithm. The proof is based on an inequality between total variation distance and $L_2$ norm distance.


```{definition, total-variation-distance, name="Total variation distance"}
The total variation between two probability distributions $P$ and $Q$ over a countable sample space $\Omega$ is given by

$$\begin{matrix}
d(P,Q) & \equiv & \sup_{A \subseteq \Omega} |P(A) - Q(A)| \\
 &=& \frac{1}{2} \sum_{\omega \in \Omega} |P(\omega) - Q(\omega)|
\end{matrix}$$
```


In the context of quantum measurement, the decision problem involves a $2$-outcome measurement at time $t=T$. The probability of obtaining the outcome $m \in \{0,1\}$ by a measurement of quantum states $\phi$ and $\tau$ are $|\phi_m|^2$ and $|\tau_m|^2$ respectively.

$$\begin{matrix}
\frac{1}{2} \sum_{m} ||\phi_m|^2 - |\tau_m|^2| &=& \frac{1}{2} \sum_{m} ||\phi_m| - |\tau_m|| \cdot ||\phi_m| + |\tau_m|| \\
 &\leq& \frac{1}{2} \sum_{m} |\phi_m - \tau_m| \cdot ||\phi_m| + |\tau_m|| \\
 &\leq& \frac{1}{2} \sqrt{\sum_{m} |\phi_m - \tau_m|^2} \sqrt{\sum_{m} \left(|\phi_m| + |\tau_m|\right)^2} \\
 &\leq& \sqrt{\sum_{m} |\phi_m - \tau_m|^2} = \|\phi - \tau\|_2
\end{matrix}$$

The first and second inequality come as a result of triangle inequality and Cauchy-Schwarz inequality respective. The third inequality comes from $(a+b)^2 \leq 2(a^2+b^2)$ and the fact that $\sum_{m} |\phi_m|^2 = \sum_{m} |\tau_m|^2 = 1$. Consider an error-bounded algorithm with tolerance of $1/3$, i.e. $| \langle 1| \psi^T_\mathbf{x}\rangle |^2 < 1/3$ and $| \langle 1| \psi^T_\mathbf{y}\rangle |^2 \geq 2/3$. The total variation distance, and therefore $\|\psi_\mathbf{x}^T - \psi_\mathbf{y}^T\|_2$, is at least $1/3$. Notice that the $L_2$-norm distance between two states can be written in terms of their inner product

\begin{align}
\|\psi_\mathbf{x}^T - \psi_\mathbf{y}^T\|_2^2 = \langle \psi_\mathbf{x}^T - \psi_\mathbf{y}^T | \psi_\mathbf{x}^T - \psi_\mathbf{y}^T \rangle =  2 - 2\Re{\langle \psi_\mathbf{x}^T | \psi_\mathbf{y}^T \rangle}.
\end{align}

We can assume that $\langle \psi_x^T | \psi_y^T \rangle$ is real, otherwise multiply $| \psi_y^T \rangle$ by some scalar of norm $1$ to make the inner product real. Then it follows that $|\langle \psi_x^T | \psi_y^T \rangle| \leq 17/18$. Intuitively, as the inner product is bounded above, the measurement statistics can distinguish $\mathbf{x} \in f^{-1}(0)$ and $\mathbf{y} \in f^{-1}(1)$.

However we should rather look at more than a fixed pair $\mathbf{x}, \mathbf{y}$ to obtain a meaningful result. This suggests us to look at some particular subset $R \subseteq X \times Y$.
Observe that the progress measure before and after the algorithm are $S(0) = |R|$ and $S(T) \leq \frac{17}{18}|R|$. So, if one can come up with an upper bound $\Delta$ for $|S(t) - S(t-1)|$, then the number of queries needed is $T \geq |S(T) - S(0)| / \Delta = |R|/18\Delta$. We prove the following theorem based on that idea.

(ref:ambainis2002adversary) [@ambainis2002adversary]

```{theorem, basic-adversary, name="Basic Adversary Method (ref:ambainis2002adversary)"}
Let $f$ be a decision problem, $X \subseteq f^{-1}(0), Y \subseteq f^{-1}(1)$, and a _binary relation_ $R \subseteq X \times Y$. Suppose that

   1. $\forall \mathbf{x} \in X$, there are at least $m_0$ distinct $\mathbf{y} \in Y$ such that $(\mathbf{x},\mathbf{y}) \in R$

   2. $\forall \mathbf{y} \in Y$, there are at least $m_1$ distinct $\mathbf{x} \in X$ such that $(\mathbf{x},\mathbf{y}) \in R$

   3. $\forall \mathbf{x} \in X$ and $\forall i \in \{0,1,\dots,n\}$, there are at most $l_0$ distinct $\mathbf{y} \in Y$ such that $x_i \neq y_i$ and $(\mathbf{x},\mathbf{y}) \in R$

   4. $\forall \mathbf{y} \in Y$ and $\forall i \in \{0,1,\dots,n\}$, there are at most $l_1$ distinct $\mathbf{x} \in X$ such that $x_i \neq y_i$ and $(\mathbf{x},\mathbf{y}) \in R$.

Then the quantum query complexity of $f$ is $\Omega{\sqrt{\frac{m_0m_1}{l_0l_1}}}$
```
The following proof is modified from a proof for the case $l_0 = l_1 = 1$ given in the lecture note [@adversary-simple].

For each $(\mathbf{x},\mathbf{y}) \in R$, let $J_{\mathbf{x}\mathbf{y}} = \{(j_1,j_2,\dots,j_k) | x_{j_1} \neq y_{j_1}, x_{j_2} \neq y_{j_2}, \dots, x_{j_k} \neq y_{j_k}\}$. From the first assumption, $|R| \geq m_0 |X|$. The quantum states corresponding to the initial inputs right before they pass through the $t$-th oracle can be represented by

\begin{align}
|\psi_\mathbf{x}^{t-1}\rangle = \sum_{i} a_i|i\rangle \otimes |\phi_i\rangle  \\
|\psi_\mathbf{y}^{t-1}\rangle = \sum_{i} b_i|i\rangle \otimes |\chi_i\rangle \\
\Rightarrow \langle \psi_\mathbf{x}^{t-1} | \psi_\mathbf{y}^{t-1} \rangle = \sum_i a_i^* b_i \langle \phi_i | \chi_i \rangle
\end{align}

When the states pass through the $t$-th oracle, it flips the sign on each $a_i, b_i$ whenever $x_i=1$ or $y_i=1$ respectively. The overall effect on the inner product is flipping the sign of the coefficient corresponding to $\langle \phi_i | \chi_i \rangle$ when $x_i \neq y_i$. We can express the inner product at time $t$ as follows

$$\langle \psi_\mathbf{x}^{t} | \psi_\mathbf{y}^{t} \rangle = \sum_{i \notin J_{\mathbf{x}\mathbf{y}}} a_i^* b_i \langle \phi_i | \chi_i \rangle - \sum_{j \in J_{\mathbf{x}\mathbf{y}}} a_j^* b_j \langle \phi_i | \chi_i \rangle$$

The change in the inner product at two successive time is

$$\langle \psi_\mathbf{x}^{t} | \psi_\mathbf{y}^{t} \rangle - \langle \psi_\mathbf{x}^{t-1} | \psi_\mathbf{y}^{t-1} \rangle = 2  \sum_{j \in J_{\mathbf{x}\mathbf{y}}} a_i^* b_i \langle \phi_i | \chi_i \rangle$$

$$\begin{matrix}
|S(t) - S(t-1)| &=& \left| \sum_{(\mathbf{x},\mathbf{y})\in R} |\langle\psi^{t}_{\mathbf{x}} | \psi^{t}_{\mathbf{y}} \rangle| - \sum_{(\mathbf{x},\mathbf{y})\in R} |\langle\psi^{t-1}_{\mathbf{x}} | \psi^{t-1}_{\mathbf{y}} \rangle|\right| \\
 &\leq&  \sum_{(\mathbf{x},\mathbf{y})\in R} \left| \langle \psi_\mathbf{x}^{t} | \psi_\mathbf{y}^{t} \rangle - \langle \psi_\mathbf{x}^{t-1} | \psi_\mathbf{y}^{t-1} \rangle \right| \\
 &=&  \sum_{(\mathbf{x},\mathbf{y})\in R} \left| 2  \sum_{j \in J_{\mathbf{x}\mathbf{y}}} a_{j(\mathbf{x},\mathbf{y})}^* b_{j(\mathbf{x},\mathbf{y})} \langle \phi_i | \chi_i \rangle \right| \\
 &\leq& \sum_{(\mathbf{x},\mathbf{y})\in R} \sum_{j \in J_{\mathbf{x}\mathbf{y}}} 2 \left| a_{j(\mathbf{x},\mathbf{y})}^* b_{j(\mathbf{x},\mathbf{y})} \right| \\
 &\leq& \sum_{(\mathbf{x},\mathbf{y})\in R} \sum_{j \in J_{\mathbf{x}\mathbf{y}}} \sqrt{\frac{m_0 l_1}{m_1 l_0}} \left| a_{j(\mathbf{x},\mathbf{y})} \right|^2 + \sqrt{\frac{m_1 l_0}{m_0 l_1}} \left| b_{j(\mathbf{x},\mathbf{y})} \right|^2
\end{matrix}$$

The last line comes from the simple inequality $|a|^2 + |b|^2 \geq 2|ab|$. Consider the first summand in the expression above.

$$\begin{matrix}
\sum_{(\mathbf{x},\mathbf{y})\in R} \sum_{j \in J_{\mathbf{x}\mathbf{y}}} \sqrt{\frac{m_0 l_1}{m_1 l_0}} \left| a_{j(\mathbf{x},\mathbf{y})} \right|^2 &=& \sqrt{\frac{m_0 l_1}{m_1 l_0}} \sum_{\mathbf{x} \in X} \sum_{i \in [N]}  \sum_{\substack{\mathbf{y} \in Y  \\  y_i \neq x_i}} |a_{i(\mathbf{x},\mathbf{y})}|^2 \\
 &\leq& \sqrt{\frac{m_0 l_1}{m_1 l_0}} \sum_{\mathbf{x} \in X} l_0 \\
 &\leq&  \sqrt{\frac{m_0 l_1}{m_1 l_0}} \frac{|R|}{m_0} l_0 =  \sqrt{\frac{l_0 l_1}{m_0 m_1}}|R|
\end{matrix}$$

The second line comes from the third assumption and the fact that $\sum_{i\in [N]} |a_{i(\mathbf{x},\mathbf{y})}|^2 = 1$ for every $(\mathbf{x}, \mathbf{y}) \in R$. We can also derive the same bound for the second summand. The upper bound for $|S(t) - S(t-1)|$ is  $\Delta = 2 \sqrt{\frac{l_0 l_1}{m_0 m_1}}|R|$. Hence $T \geq \frac{|R|}{18\Delta} \in \Omega \left(\sqrt{\frac{m_0 m_1}{l_0 l_1}}\right)$, which is the conclusion of the theorem.

We revisit two examples of PARITY and OR functions and show the adversary method and the polynomial method give the same lower bound. Recall from the Polynomial method that one needs at least $n/2$ queries for PARITY and $\Theta(\sqrt{n})$ for OR. Let's see if the same result can be obtained with the Adversary method. The PARITY function $f$ maps binary strings with odd Hamming weight to $1$ and even Hamming weight to $0$. Let $X = f^{-1}(0), Y = f^{-1}(1),$ and $R = \{(\mathbf{x},\mathbf{y}) | d(\mathbf{x},\mathbf{y}) = 1\}$. It not difficult to see that $m_0 = m_1 = n, l_0 = l_1 = 1$ so that the function requires $\Omega(n)$ queries. The OR function $g$ maps only the all-zero sequence to $0$ and other inputs to $1$. Let $X = \{00\dots 00\} = g^{-1}(0)$ and $Y = \{\mathbf{y} | \mathbf{y} \text{ contains exactly one bit } 1\} \subset g^{-1}(1)$. Then, $m_0 = n, m_1=1, l_0 = l_1 = 1$, so we have a lower bound of $\Omega(\sqrt{n})$.