Fourier-analytic proof of Sperner

From Polymath Wiki
Jump to: navigation, search

Fourier analysis

We identify [math][2]^n[/math] with the power set [math]2^{[n]}[/math] of [n]. Given any function [math]f: 2^{[n]} \to {\Bbb R}[/math], we define the Fourier transform [math]\hat f: 2^{[n]} \to {\Bbb R}[/math] by the formula

[math]\hat f(A) := {\Bbb E}_{X \in 2^{[n]}} (-1)^{|A \cap X|} f(X)[/math];

we have the Plancherel formula

[math]\sum_{A \in 2^{[n]}} |\hat f(A)|^2 = {\Bbb E}_{X \in 2^{[n]}} |f(X)|^2[/math] (1)

and the inversion formula

[math]f(X) = \sum_{A \in 2^{[n]}} (-1)^{|A \cap X|} \hat f(A)[/math]. (2)

For any scale [math]1 \lt r \lt n[/math], we consider the bilinear form

[math] B_r( f, g ) := {\Bbb E} f(X) g(Y)[/math] (3)

where X is drawn randomly from [math]2^{[n]}[/math], and Y is formed from X by setting [math]j \in Y[/math] whenever [math]j \in X[/math], and when [math]j \not \in X[/math], setting [math]j \in Y[/math] with probability [math]r/n[/math]. To avoid significant distortion in the measure, we will be working in the regime [math]1 \leq r \ll \sqrt{n}[/math]. Note that if [math]B_r(1_A, 1_A)[/math] is large, and r is not too small, then A will contain many combinatorial lines (since Y will be strictly larger than X with high probability); thus it is of interest to get lower bounds on [math]B_r[/math].

We also define the influence Inf(f) as

[math] Inf(f) := {\Bbb E} |f(X) - f(X')|^2[/math] (4)

where X' is formed from X by randomly flipping one bit (from 0 to 1, or vice versa). In Fourier space, we have

[math] Inf(f) = 4\sum_{A \in 2^{[n]}} \frac{|A|}{n} |\hat f(A)|^2[/math]. (5)

Dichotomy between structure and randomness

For any [math]f: 2^{[n]} \to {\Bbb R}[/math], define the uniformity norm [math]\|f\|_{U(r)}[/math] at scale r by the formula

[math]\|f\|_{U(r)} := \sup_g \max( |B_r(f,g)|, |B_r(g,f)| )[/math] (6)

where g ranges over all functions [math]g: 2^{[n]} \to [-1,1][/math]. Functions which are small in the uniform norm are thus "negligible" for the purposes of computing [math]B_r[/math] and can be discarded. (This is analogous to the theory of counting arithmetic progressions of length three, in which functions with small Fourier coefficients (or small Gowers [math]U^2[/math] norm) can be discarded.)

Lemma 1 Let [math]f: 2^{[n]} \to [-1,1][/math] be such that [math]\|f\|_{U(r)} \geq \eta[/math], where [math]1 \leq r \ll \sqrt{n}[/math] and [math]\eta \gt 0[/math]. Then there exists a function [math]F: 2^{[n]} \to [-1,1][/math] of influence [math]Inf(F) \ll_\eta \frac{1}{r}[/math] such that
[math] \langle f, F \rangle := {\Bbb E}_{X \in 2^{[n]}} f(X) F(X) \gg_\eta 1.[/math]

It seems that this is more or less proven in Ryan's notes. (May be worth redoing it here on the wiki. Note that the scale r here basically corresponds to [math]\epsilon n[/math] in Ryan's notes.)

Structure theorem

Now we use the general "energy increment" argument (see e.g. Terry's paper on this) to obtain a structural theorem.

Corollary 2 (Koopman-von Neumann type theorem) Let [math]f: 2^{[n]} \to [0,1][/math] and [math]0 \lt \eta \lt 1[/math]. Then there exists [math]k = k(\eta)[/math] such that whenever [math]\sqrt{n} \gg r_k \gt \ldots \gt r_1 \gt 1[/math] be a sequence of scales. Then there exists [math]1 \leq i \leq k[/math] and a decomposition
[math]f = f_{U^\perp} + f_U [/math] (7)
where [math]f_{U^\perp}, f_U[/math] take values in [-1,1],
[math]Inf( f_{U^\perp} ) \ll_{\eta} 1/r_i [/math] (8)
[math]\| f_U \|_{U(r_{i+1})} \leq \eta.[/math] (9)
Furthermore, [math]f_{U^\perp}[/math] is non-negative and
[math] {\Bbb E} f = {\Bbb E} f_{U^\perp}.[/math] (10)

Proof (sketch) It's likely that there is now a slick proof of this type of thing using the duality arguments of Gowers or of Reingold-Trevisan-Tulsiani-Vadhan. But here is the "old school" approach from my long AP paper with Ben, running the energy increment algorithm:

  1. Initialise i=1, and let [math]{\mathcal B}[/math] be the trivial partition of [math][2]^n[/math].
  2. Let [math]f_{U^\perp} := {\Bbb E}(f|{\mathcal B})[/math] be the conditional expectation of f with respect to the partition; initially, this is just the constant function [math]{\Bbb E} f[/math], but becomes more complicated as the partition gets finer. Set [math]f_U := f - f_{U^\perp}[/math]
  3. If [math]\| f_U \|_{U(r_{i+1})} \leq \eta[/math] then STOP.
  4. Otherwise, we apply Lemma 1 to find a function F of influence [math]O_\eta(1/r_{i+1})[/math] that [math]f_U[/math] correlates with. Partition F into level sets (discretising in multiples of [math]\eta[/math] or so, randomly shifting the cut-points to avoid edge effects) and use this to augment the partition [math]{\mathcal B}[/math]. In doing so, the energy [math]{\Bbb E} |f_{U^\perp}|^2[/math] increases by some non-negligible amount [math]c(\eta) \gt 0[/math], thanks to Pythagoras' theorem: this is why this process will stop before k steps.
  5. Now return to Step 1.

A certain tedious amount of verification is needed to show that all the hypotheses are satisfied. One technical step is that one needs to use the Weierstrass approximation theorem to approximate [math]f_{U^\perp}[/math] by a polynomial combination of low-influence functions, and one needs to show that polynomial combinations of low-influence functions are low-influence. The complexity of the polynomial is controlled by some function of [math]\eta[/math] and this explains the losses in that parameter in (8). [math]\Box[/math]


Now we prove DHJ(2). Let [math]f = 1_A[/math] have density [math]\delta[/math]. We let [math]\eta[/math] be a small number depending on [math]\delta[/math]. Now we pick some scales [math]1 \leq r_1 \lt \ldots \lt r_k[/math] between 1 and [math]\sqrt{n}[/math] (the exact choices are not so important, so long as the scales are widely separated), and apply the above corollary, to decompose [math]f = f_{U^\perp} + f_U[/math], where [math]f_U[/math] is uniform at some scale [math]r_i[/math] and [math]f_{U^\perp}[/math] is low-influence at a coarser scale [math]r_{i+1}[/math].

We then look at [math]B_{r_i}(f, f)[/math]. Decomposing into four pieces and using the uniformity of [math]f_U[/math], this expression is equal to [math]B_{r_i}(f_{U^\perp}, f_{U^\perp}) + O(\eta)[/math]. But the low influence of [math]f_{U^\perp}[/math] means that when one randomly flips a bit from a 0 to a 1, [math]f_{U^\perp}[/math] changes by [math]O_\eta(1/r_{i+1})[/math] on the average. Iterating this [math]r_i[/math] times, we see that [math]B_{r_i}(f_{U^\perp}, f_{U^\perp}) = {\Bbb E}( f_{U^\perp}^2 ) + O_\eta( r_i / r_{i+1} )[/math]. But [math]f_{U^\perp}[/math] has density [math]\delta[/math], and so the main term is at least [math]\delta^2[/math]. Choosing [math]\eta[/math] small enough and the [math]r_i[/math] widely separated enough we obtain a nontrivial lower bound on [math]B_{r_i}(f_{U^\perp}, f_{U^\perp})[/math], which gives DHJ(2).