Fourier-analytic proof of Sperner
Contents
Fourier analysis
We identify [math][2]^n[/math] with the power set [math]2^{[n]}[/math] of [n]. Given any function [math]f: 2^{[n]} \to {\Bbb R}[/math], we define the Fourier transform [math]\hat f: 2^{[n]} \to {\Bbb R}[/math] by the formula
- [math]\hat f(A) := {\Bbb E}_{X \in 2^{[n]}} (-1)^{|A \cap X|} f(X)[/math];
we have the Plancherel formula
- [math]\sum_{A \in 2^{[n]}} |\hat f(A)|^2 = {\Bbb E}_{X \in 2^{[n]}} |f(X)|^2[/math] (1)
and the inversion formula
- [math]f(X) = \sum_{A \in 2^{[n]}} (-1)^{|A \cap X|} \hat f(A)[/math]. (2)
For any scale [math]1 \lt r \lt n[/math], we consider the bilinear form
- [math] B_r( f, g ) := {\Bbb E} f(X) g(Y)[/math] (3)
where X is drawn randomly from [math]2^{[n]}[/math], and Y is formed from X by setting [math]j \in Y[/math] whenever [math]j \in X[/math], and when [math]j \not \in X[/math], setting [math]j \in Y[/math] with probability [math]r/n[/math]. To avoid significant distortion in the measure, we will be working in the regime [math]1 \leq r \ll \sqrt{n}[/math]. Note that if [math]B_r(1_A, 1_A)[/math] is large, and r is not too small, then A will contain many combinatorial lines (since Y will be strictly larger than X with high probability); thus it is of interest to get lower bounds on [math]B_r[/math].
We also define the influence Inf(f) as
- [math] Inf(f) := {\Bbb E} |f(X) - f(X')|^2[/math] (4)
where X' is formed from X by randomly flipping one bit (from 0 to 1, or vice versa). In Fourier space, we have
- [math] Inf(f) = 4\sum_{A \in 2^{[n]}} \frac{|A|}{n} |\hat f(A)|^2[/math]. (5)
Dichotomy between structure and randomness
For any [math]f: 2^{[n]} \to {\Bbb R}[/math], define the uniformity norm [math]\|f\|_{U(r)}[/math] at scale r by the formula
- [math]\|f\|_{U(r)} := \sup_g \max( |B_r(f,g)|, |B_r(g,f)| )[/math] (6)
where g ranges over all functions [math]g: 2^{[n]} \to [-1,1][/math]. Functions which are small in the uniform norm are thus "negligible" for the purposes of computing [math]B_r[/math] and can be discarded. (This is analogous to the theory of counting arithmetic progressions of length three, in which functions with small Fourier coefficients (or small Gowers [math]U^2[/math] norm) can be discarded.)
- Lemma 1 Let [math]f: 2^{[n]} \to [-1,1][/math] be such that [math]\|f\|_{U(r)} \geq \eta[/math], where [math]1 \leq r \ll \sqrt{n}[/math] and [math]\eta \gt 0[/math]. Then there exists a function [math]F: 2^{[n]} \to [-1,1][/math] of influence [math]Inf(F) \ll_\eta \frac{1}{r}[/math] such that
- [math] \langle f, F \rangle := {\Bbb E}_{X \in 2^{[n]}} f(X) F(X) \gg_\eta 1.[/math]
It seems that this is more or less proven in Ryan's notes. (May be worth redoing it here on the wiki. Note that the scale r here basically corresponds to [math]\epsilon n[/math] in Ryan's notes.)
Structure theorem
Now we use the general "energy increment" argument (see e.g. Terry's paper on this) to obtain a structural theorem.
- Corollary 2 (Koopman-von Neumann type theorem) Let [math]f: 2^{[n]} \to [0,1][/math] and [math]0 \lt \eta \lt 1[/math]. Then there exists [math]k = k(\eta)[/math] such that whenever [math]\sqrt{n} \gg r_k \gt \ldots \gt r_1 \gt 1[/math] be a sequence of scales. Then there exists [math]1 \leq i \leq k[/math] and a decomposition
- [math]f = f_{U^\perp} + f_U [/math] (7)
- where [math]f_{U^\perp}, f_U[/math] take values in [-1,1],
- [math]Inf( f_{U^\perp} ) \ll_{\eta} 1/r_i [/math] (8)
- and
- [math]\| f_U \|_{U(r_{i+1})} \leq \eta.[/math] (9)
- Furthermore, [math]f_{U^\perp}[/math] is non-negative and
- [math] {\Bbb E} f = {\Bbb E} f_{U^\perp}.[/math] (10)
Proof (sketch) It's likely that there is now a slick proof of this type of thing using the duality arguments of Gowers or of Reingold-Trevisan-Tulsiani-Vadhan. But here is the "old school" approach from my long AP paper with Ben, running the energy increment algorithm:
- Initialise i=1, and let [math]{\mathcal B}[/math] be the trivial partition of [math][2]^n[/math].
- Let [math]f_{U^\perp} := {\Bbb E}(f|{\mathcal B})[/math] be the conditional expectation of f with respect to the partition; initially, this is just the constant function [math]{\Bbb E} f[/math], but becomes more complicated as the partition gets finer. Set [math]f_U := f - f_{U^\perp}[/math]
- If [math]\| f_U \|_{U(r_{i+1})} \leq \eta[/math] then STOP.
- Otherwise, we apply Lemma 1 to find a function F of influence [math]O_\eta(1/r_{i+1})[/math] that [math]f_U[/math] correlates with. Partition F into level sets (discretising in multiples of [math]\eta[/math] or so, randomly shifting the cut-points to avoid edge effects) and use this to augment the partition [math]{\mathcal B}[/math]. In doing so, the energy [math]{\Bbb E} |f_{U^\perp}|^2[/math] increases by some non-negligible amount [math]c(\eta) \gt 0[/math], thanks to Pythagoras' theorem: this is why this process will stop before k steps.
- Now return to Step 1.
A certain tedious amount of verification is needed to show that all the hypotheses are satisfied. One technical step is that one needs to use the Weierstrass approximation theorem to approximate [math]f_{U^\perp}[/math] by a polynomial combination of low-influence functions, and one needs to show that polynomial combinations of low-influence functions are low-influence. The complexity of the polynomial is controlled by some function of [math]\eta[/math] and this explains the losses in that parameter in (8). [math]\Box[/math]
DHJ(2)
Now we prove DHJ(2). Let [math]f = 1_A[/math] have density [math]\delta[/math]. We let [math]\eta[/math] be a small number depending on [math]\delta[/math]. Now we pick some scales [math]1 \leq r_1 \lt \ldots \lt r_k[/math] between 1 and [math]\sqrt{n}[/math] (the exact choices are not so important, so long as the scales are widely separated), and apply the above corollary, to decompose [math]f = f_{U^\perp} + f_U[/math], where [math]f_U[/math] is uniform at some scale [math]r_i[/math] and [math]f_{U^\perp}[/math] is low-influence at a coarser scale [math]r_{i+1}[/math].
We then look at [math]B_{r_i}(f, f)[/math]. Decomposing into four pieces and using the uniformity of [math]f_U[/math], this expression is equal to [math]B_{r_i}(f_{U^\perp}, f_{U^\perp}) + O(\eta)[/math]. But the low influence of [math]f_{U^\perp}[/math] means that when one randomly flips a bit from a 0 to a 1, [math]f_{U^\perp}[/math] changes by [math]O_\eta(1/r_{i+1})[/math] on the average. Iterating this [math]r_i[/math] times, we see that [math]B_{r_i}(f_{U^\perp}, f_{U^\perp}) = {\Bbb E}( f_{U^\perp}^2 ) + O_\eta( r_i / r_{i+1} )[/math]. But [math]f_{U^\perp}[/math] has density [math]\delta[/math], and so the main term is at least [math]\delta^2[/math]. Choosing [math]\eta[/math] small enough and the [math]r_i[/math] widely separated enough we obtain a nontrivial lower bound on [math]B_{r_i}(f_{U^\perp}, f_{U^\perp})[/math], which gives DHJ(2).