Austin's proof II

From Polymath Wiki
Jump to: navigation, search

This is a variant of Austin's proof that eschews Ramsey theory (except for pigeonhole principle) and/or density incrementation.

DHJ(3) If [math]\delta \gt 0[/math] and n is sufficiently large depending on [math]\delta[/math], then any subset A of [math][3]^n[/math] of density at least [math]\delta[/math] contains a combinatorial line.

Step 0: selection of parameters

We can assume that [math]\delta[/math] is smaller than some small absolute constant.

We will need a number [math]\varepsilon \gt 0[/math] that is sufficiently small depending on [math]\delta[/math]. This parameter measures the amount of regularity we will extract from the abstract regularity lemma.

We need a function [math]F: {\Bbb R}^+ \to {\Bbb R}^+[/math] that is sufficiently rapidly growing depending on [math]\varepsilon,\delta[/math]. This measures the quality of the fine approximation from the abstract regularity lemma.

We need a number L which is sufficiently large depending on [math]\varepsilon,\delta,F[/math]. This measures the number of scales between the partitions in the abstract regularity lemma, and also the separation between adjacent scales.

We need a number J which is sufficiently large depending on [math]\varepsilon,\delta,F, L[/math]. This measures the total number of scales in play.

We assume n is sufficiently large depending on [math]\varepsilon, \delta, F, L, J[/math].

We introduce the scales [math]n = n_0 \gt n_1 \gt \ldots \gt n_J[/math] by setting [math]n_j := \lfloor n/L^j \rfloor[/math]. Note that [math]n_j[/math] are all larger than 1.

Step 1: Setting up the probability space and basic partitions

We set up a probability space X by creating the following random variables:

  1. First, select a string x at random from [math][3]^n[/math] using equal-slices measure. (NB: one could also use uniform measure here, but this is inferior as one then has to make each [math]n_{j+1}[/math] something like [math]n_j^{1/2}/L[/math] rather than [math]n_j/L[/math].)
  2. Set [math]I_0 := [n][/math], and then for [math]j=1,2,3,\ldots,J[/math] in turn, set [math]I_j[/math] to be a random subset of [math]I_{j-1}[/math] of cardinality [math]n_j[/math], chosen uniformly at random for any fixed choice of [math]x,I_0,\ldots,I_{j-1}[/math].

Let X be the probability space generated by the random variables [math]x, I_0,\ldots,I_J[/math]. This is a finite probability space, with every point occurring with positive probability (we can of course throw out all configurations that occur with zero probability). Clearly any function [math]f: [3]^n \to {\Bbb R}[/math] can be equated with the random variable f(x) on X.

Next, for [math]0 \leq j \leq J[/math] and [math]i=0,1,2[/math], we form the following [math]\sigma[/math]-algebras (partitions):

  1. [math]{\mathcal B}^j_{012}[/math] is the space of all events that depend on the sets [math]I_0,\ldots,I_j[/math], and on the string x.
  2. [math]{\mathcal B}^j_{\emptyset}[/math] is the space of all events that depend on the sets [math]I_0,\ldots,I_j[/math], and on the values of the string x outside of [math]I_j[/math].
  3. [math]{\mathcal B}^j_i[/math] is the space of all events that depend on the sets [math]I_0,\ldots,I_j[/math], on the values of the string x outside of [math]I_j[/math], and on the i-set of x inside [math]I_j[/math] (i.e. the set of positions inside [math]I_j[/math] where x takes the value i).

We observe the inclusions

[math]{\mathcal B}^j_\emptyset \leq {\mathcal B}^j_i \leq {\mathcal B}^j_{012}[/math]

for i=0,1,2, and

[math]{\mathcal B}^j_S \leq {\mathcal B}^{j+1}_S[/math]

for [math]S=\emptyset,0,1,2,012[/math].

Step 2. Regularisation

We apply the abstract regularity lemma with the functions [math]f_0=f_1=f_2:=1_A(x)[/math], the parameters [math]\varepsilon, F[/math], and the [math]\sigma[/math]-algebras [math]{\mathcal B}^{mL}_i[/math] for [math]m=0,1,2,\ldots[/math]. This gives us positive integers [math]M \leq M' \leq O_{F,\varepsilon}(1)[/math] and factors [math]{\mathcal A}_i^{coarse} \leq {\mathcal A}_i^{fine}[/math] for i=0,1,2 with the following properties for {i,j,k}={0,1,2}:

  1. (Coarse partition is low complexity) [math]{\mathcal A}_i^{coarse} \leq {\mathcal B}_i^{ML}[/math], and [math]{\mathcal A}_i^{coarse}[/math] has complexity M.
  2. (Fine partition is medium complexity) [math]{\mathcal A}_i^{fine} \leq {\mathcal B}_i^{M'L}[/math], and [math]{\mathcal A}_i^{fine}[/math] has complexity M'.
  3. (Coarse and fine approximations are close) [math]\| {\Bbb E}(1_A | {\mathcal A}_j^{fine} \vee {\mathcal A}_k^{fine} ) - {\Bbb E}(1_A | {\mathcal A}_j^{coarse} \vee {\mathcal A}_k^{coarse} ) \|_{L^2(X)} \leq \varepsilon[/math].
  4. (Fine approximation is extremely high quality) For any [math] E_j \in {\mathcal B}_j^{M'L+L}[/math], E_k \in {\mathcal B}_k^{M'L+L}</math>, we have [math]| {\Bbb E} (1_A - {\Bbb E}(1_A| {\mathcal A}_j^{fine} \vee {\mathcal A}_k^{fine} )) 1_{E_j} 1_{E_k} | \leq 1/F(M)[/math].

For each {i,j,k}={0,1,2} we decompose

[math]1_A = f_{i,U^\perp} + f_{i,err} + f_{i,U}[/math]


[math]f_{i,U^\perp} = {\Bbb E}(1_A | {\mathcal A}_j^{coarse} \vee {\mathcal A}_k^{coarse} )[/math]
[math]f_{i,err} = {\Bbb E}(1_A | {\mathcal A}_j^{fine} \vee {\mathcal A}_k^{fine} ) - {\Bbb E}(1_A | {\mathcal A}_j^{coarse} \vee {\mathcal A}_k^{coarse} )[/math]
[math]f_{i,U} = 1_A - {\Bbb E}(1_A | {\mathcal A}_j^{fine} \vee {\mathcal A}_k^{fine} )[/math].

We record some salient features of these functions:

  1. (Structure) [math]f_{i,U^\perp}[/math] has density [math]\delta[/math], takes values between 0 and 1, and is constant on intersections between cells of [math]{\mathcal A}_j^{coarse}[/math] and cells of [math]{\mathcal A}_k^{coarse}[/math] (there are at most [math]2^{2M}[/math] such intersections in all).
  2. (Smallness) [math]f_{i,err}[/math] has an [math]L^2[/math] norm of at most [math]\varepsilon[/math].
  3. (Uniformity) [math]|{\Bbb E} f_{i,U} 1_{E_j} 1_{E_k}| \leq 1/F(M)[/math] whenever [math]E_j \in {\mathcal B}_j^{M'L+L}[/math] and [math]E_j \in {\mathcal B}_k^{M'L+L}[/math].
  4. (Measurability) The values of [math]f_{i,U^\perp}, f_{i,err}, f_{i,U}[/math] depend on the string x and on the choices of [math]I_0, I_1, \ldots, I_{M'L}[/math], but are independent of subsequent choices of the [math]A[/math]'s. In other words, these random variables are [math]{\mathcal B}^{M'L}_{012}[/math]-measurable.

We also observe

  1. (Large intersection) We have [math] {\Bbb E} \prod_{i=0,1,2} f_{i,U^\perp} \gg \delta^4[/math].

Indeed, as [math]f_{i,U^\perp}[/math] is a conditional expectation of [math]1_A[/math], we see that A has measure at most [math]\delta/10[/math] on the set [math]\{ f_{i,U^\perp} \leq \delta/10 \}[/math]. From the union bound, we conclude that on at least half of A, [math]f_{i,U^\perp} \geq \delta/10[/math] for all i=0,1,2, and the claim follows.

Step 3. Secondary regularisation

This is a step which is needed in the DHJ setting but not in the triangle-removal setting, due to the multiplicity of scales.

For [math]0 \leq l \leq L[/math], define the energy

[math]E_l := \sum_{\{i,j,k\}=\{0,1,2\}} \sum_{1 \leq a,b \leq 2^M} \| {\Bbb E}(1_{E_i^a}|{\mathcal B}^{M'L+l}_\emptyset) \|_{L^2}^2 + \| {\Bbb E}(1_{E_i^a \cap E_j^b}|{\mathcal B}^{M'L+l}_\emptyset) \|_{L^2}^2 + \| {\Bbb E}(f_{k,err} 1_{E_i^a \cap E_j^b} |{\mathcal B}^{M'L+l}_\emptyset) \|_{L^2}^2[/math].

This energy is increasing in l (by Pythagoras) and bounded above by [math]O_M(1)[/math]. Thus, by the pigeonhole principle we can find [math]0 \leq l \lt L[/math] such that

[math]E_{l+1}-E_l \ll_{M,\varepsilon} 1/L[/math].

Fix this l. By Pythagoras, we conclude that

[math] \| osc(1_{E_i^a}) \|_{L^2} \ll_{M,\varepsilon} L^{-1/2}[/math]
[math] \| osc(1_{E_i^a \cap E_j^b}) \|_{L^2} \ll_{M,\varepsilon} L^{-1/2}[/math]
[math] \| f_{k,err} osc(1_{E_i^a \cap E_j^b}) \|_{L^2} \ll_{M,\varepsilon} L^{-1/2}[/math]

for all [math]\{i,j,k\}=\{0,1,2\}[/math] and [math]1 \leq a,b \leq 2^M[/math], where we define

[math]osc(f) := {\Bbb E}( |{\Bbb E}(f|{\mathcal B}^{M'L+l+1}_\emptyset) - {\Bbb E}(f|{\mathcal B}^{M'L+l}_\emptyset)|^2 | {\mathcal B}^{M'L+l}_\emptyset )[/math];

note that this is a [math]{\mathcal B}^{M'L+l}_\emptyset[/math]-measurable quantity.

Step 4. Counting

Now for the "counting lemma" stage. Let f,g,h be real-valued [math]{\mathcal B}^{M'L+l}_{012}[/math]-measurable random variables; we view f,g,h as functions [math]f,g,h: [3]^n \to {\Bbb R}[/math] that also depend on the choices of sets [math]I_0,\ldots,I_{M'L+l}[/math]. Given such functions, we define the trilinear forms

[math]\Lambda_0(f,g,h) := {\Bbb E} f(x) g( \rho^{M'L+l+1}_{0 \to 1} x ) h( \rho^{M'L+l+1}_{0 \to 2} x )[/math]
[math]\Lambda_1(f,g,h) := {\Bbb E} f(\rho^{M'L+l+1}_{1 \to 0} x) g( x ) h( \rho^{M'L+l+1}_{1 \to 2} x )[/math]
[math]\Lambda_2(f,g,h) := {\Bbb E} f(\rho^{M'L+l+1}_{2 \to 0} x) g( \rho^{M'L+1}_{2 \to 1} x ) h( x )[/math]

These trilinear forms are essentially equivalent:

Lemma 2 If f,g,h are [math]{\mathcal B}^{M'L+l}_{012}[/math]-measurable and take values in [-1,1], then [math]\Lambda_0(f,g,h), \Lambda_1(f,g,h), \Lambda_2(f,g,h)[/math] differ by [math]O(1/L)[/math].

Proof: Suppose we condition on (i.e. fix) the choices of [math]I_0,\ldots,I_{M'L+l}[/math], as well as the coefficients of x outside of [math]I_{M'L+l}[/math]. Then x is distributed in an [math]n_{M'L+l}[/math]-dimensional combinatorial subspace Vindexed by [math]I_{M'L+l}[/math], and essentially has the equal-slices distribution (it may even have exactly the equal-slices distribution - did not check this). The random lines [math]\ell_i: [3] \to V[/math] defined by [math]\ell_i(j) := \rho^{M'L+l+1}_{i \to j}(x)[/math] are then random lines in V; one can check by some computation (to be done later) that the three probability distributions for these random lines differ in total variation by [math]O(1/L)[/math], basically because [math]n_{M'L+l+1} = O(n_{M'L+l} / L)[/math]. [math]\Box[/math]

We now seek a lower bound for

[math]\Lambda_0(1_A, 1_A, 1_A)[/math] (1)

(note from Lemma 1 that we may replace [math]\Lambda_0[/math] here by [math]\Lambda_1[/math] or [math]\Lambda_2[/math] and only accept an error of [math]O(1/L)[/math] in the process). We have the partitions [math]{\mathcal A}_i^{coarse}[/math] for i=0,1,2, which (as they have complexity at most M) partition X into disjoint events [math]E_i^1,\ldots,E_i^{2^M} \in {\mathcal B}^{M'L+l}_i[/math]. We can then partition (1) as

[math]\sum_{1 \leq a,b,c \leq 2^M} \Lambda_0( 1_A 1_{E_1^b}, 1_A 1_{E_2^c}, 1_A 1_{E_0^a} )[/math]

which by various invariance properties can be rearranged as

[math]\sum_{1 \leq a,b,c \leq 2^M} \Lambda_0( 1_A 1_{E_1^b \cap E_2^c}, 1_A 1_{E_0^a \cap E_2^c}, 1_A 1_{E_0^a \cap E_1^b} )[/math].

This is analogous to how the number of triangles in a tripartite graph can be split up into a sum over triples of cells in each of the three pieces.

Now we throw away some "bad" triples (a,b,c), but as opposed to the triangle removal case, in which there is a globally deterministic choice of which triples are bad, here we must work "relative" to the factor [math]{\mathcal B}^{M'L+l}_\emptyset[/math], and so the badness of any given triple (a,b,c) is a [math]{\mathcal B}^{M'L+l}_\emptyset[/math]-measurable random variable, i.e. it depends on [math]I_0,\ldots,I_{M'L+l}[/math] and on the values of x outside of [math]I_{M'L+l}[/math]. If we fix (condition) those random variables, we say that a pair (a,b) "01-bad" if at least one of the following is true:

  1. (0-cell too small) The expected value [math]{\Bbb E}(1_{E_0^a}|{\mathcal B}^{M'L+l}_\emptyset)[/math] of [math]E_0^a[/math] is less than [math]\varepsilon^{0.1}/2^M[/math].
  2. (1-cell too small) The expected value [math]{\Bbb E}(1_{E_1^b}|{\mathcal B}^{M'L+l}_\emptyset)[/math] of [math]E_1^b[/math] is less than [math]\varepsilon^{0.1}/2^M[/math].
  3. (0-cell too unstable) [math]osc(1_{E_0^a})[/math] is larger than [math]L^{-0.1}[/math].
  4. (1-cell too unstable) [math]osc(1_{E_1^b})[/math] is larger than [math]L^{-0.1}[/math].
  5. (01-cell too unstable) [math]osc(1_{E_0^a \cap E_1^b})[/math] is larger than [math]L^{-0.1}[/math].
  6. ([math]f_{2,err}[/math] too unstable) [math]osc(f_{2,err} 1_{E_0^a \cap E_1^b})[/math] is larger than [math]L^{-0.1}[/math].
  7. (A too sparse) [math]f_{0,U^\perp}[/math] (which is constant on [math]E_0^a \cap E_1^b[/math] by construction) is less than [math]\varepsilon^{0.1}[/math] on [math]E_0^a \cap E_1^b[/math].
  8. (A too irregular) The quantity [math]{\Bbb E}(|f_{2,err}| 1_{E_0^a \cap E_1^b} |{\mathcal B}^{M'L+l}_\emptyset)[/math] is larger than [math]\varepsilon^{0.8} {\Bbb E}(1_{E_0^a \cap E_1^b} |{\mathcal B}^{M'L+l}_\emptyset)[/math].

We similarly define the notion of a pair (b,c) being "12-bad", or a pair (c,a) being "20-bad". We say that a triple (a,b,c) is bad if either (a,b) is 01-bad, (b,c) is 12-bad, or (c,a) is 20-bad. Call a triple (a,b,c) good if it is not bad. We can clearly bound (1) from below by

[math]\sum_{(a,b,c)} X_{a,b,c}[/math]


[math] X_{a,b,c} = {\Bbb E} 1_{(a,b,c) good} \Lambda_0( 1_A 1_{E_1^b \cap E_2^c}, 1_A 1_{E_0^a \cap E_2^c}, 1_A 1_{E_0^a \cap E_1^b} )[/math] (2)

and the expectation used to define [math]\Lambda_0[/math] is conditioned relative to [math]{\mathcal B}^{M'L+l}_\emptyset[/math]. (This is clunky - I will rewrite the notation later.) The outer expectation is in the [math]\sigma[/math]-algebra [math]{\mathcal B}^{M'L+l}_\emptyset[/math].

Now we split each of the factors [math]1_A[/math] here into [math]1_A = f_{i,U^\perp} + f_{i,err} + f_{i,U}[/math] for i=0,1,2. We now dispose of the contribution of the uniform parts. Take for instance the contribution of [math]f_{0,U}[/math] to [math]X_{a,b,c}[/math]:

[math]{\Bbb E} 1_{(a,b,c) good} \Lambda_0( f_{0,U} 1_{E_1^b \cap E_2^c}, 1_A 1_{E_0^a \cap E_2^c}, 1_A 1_{E_0^a \cap E_1^b} )[/math].

We expand out [math]\Lambda_0[/math] and write this as

[math]{\Bbb E} 1_{(a,b,c) good} {\Bbb E}( f_{0,U} 1_{E_1^b \cap E_2^c}(x) 1_A 1_{E_0^a \cap E_2^c}( \rho^{M'L+l+1}_{0 \to 1} x ) 1_A 1_{E_0^a \cap E_2^c}( \rho^{M'L+l+1}_{0 \to 2} x ) | {\mathcal B}^{M'L+l}_\emptyset ).[/math]

After a lot of rearranging, this is the expectation of [math]f_{0,U}[/math], times a bounded [math]{\mathcal B}^{M'L+l+1}_1[/math]-measurable function, times a bounded [math]{\mathcal B}^{M'L+l+1}_2[/math]-measurable function (the point being that every factor, other than the one coming from [math]f_{0,U}[/math], is either 02-insensitive or 01-insensitive on [math]I_{M'L+l+1}[/math]. Applying the uniformity property of [math]f_{0,U}[/math], we conclude that this term is [math]O( 1 / F(M) )[/math]. We can thus replace the first [math]1_A[/math] term in (2) by [math]1_A - f_{0,U} = f_{0,U^\perp} + f_{0,err}[/math]. Similarly for the other two terms in [math]X_{a,b,c}[/math] (using Lemma 2 as necessary, losing an additional factor of [math]O(1/L)[/math]). We are thus left with

[math]X_{a,b,c} = {\Bbb E} 1_{(a,b,c) good} \Lambda_0( (f_{0,U^\perp}+f_{0,err}) 1_{E_1^b \cap E_2^c}, (f_{1,U^\perp}+f_{1,err}) 1_{E_0^a \cap E_2^c}, (f_{2,U^\perp}+f_{2,err}) 1_{E_0^a \cap E_1^b} ) + O( 1/F(M) ) + O(1/L).[/math] (3)

Now let us compare this quantity with the quantity

[math]Y_{a,b,c} := {\Bbb E} 1_{(a,b,c) good} \Lambda_0( 1_{E_1^b \cap E_2^c}, 1_{E_0^a \cap E_2^c}, 1_{E_0^a \cap E_1^b} ).[/math]

First, let's look at the main term of (3), given by [math]f_{0,U^\perp}, f_{1,U^\perp}, f_{2,U^\perp}[/math]. As (a,b,c) is good, [math]f_{0,U^\perp}[/math] is constant on [math]E_1^b \cap E_2^c[/math] and is at least [math]\varepsilon^{0.1}[/math] on this cell, and similarly for the other two terms. Thus this contibution to [math]X_{a,b,c}[/math] is at least [math]\varepsilon^{0.3} Y_{a,b,c}[/math].

Now let's look at one of the other terms, say a term involving [math]f_{0,err}[/math]. Taking absolute values and bounding the other f factors by O(1), this type of term is bounded in magnitude by

[math]{\Bbb E} 1_{(a,b,c) good} \Lambda_0( |f_{0,err}| 1_{E_1^b \cap E_2^c}, 1_{E_0^a \cap E_2^c}, 1_{E_0^a \cap E_1^b} ).[/math]

We expand this out and collect some terms to obtain

[math]{\Bbb E} 1_{(a,b,c) good} |f_{0,err}| 1_{E_1^b \cap E_2^c}(x) 1_{E_0^a}( \rho^{M'L+l+1}_{012 \to 2}(x) );[/math]

since [math]1_{E_0^a}( \rho^{M'L+l+1}_{012 \to 2}(x) )[/math] is [math]{\mathcal B}^{M'L+l+1}_\emptyset[/math]-measurable, we can rewrite this as

[math]{\Bbb E} 1_{(a,b,c) good} {\Bbb E}( |f_{0,err}| 1_{E_1^b \cap E_2^c} | {\mathcal B}^{M'L+l+1}_\emptyset ) 1_{E_0^a}(x).[/math]

As (a,b,c) is good, we can replace the [math]{\mathcal B}^{M'L+l+1}_\emptyset[/math] conditional expectation here by [math]{\mathcal B}^{M'L+l}_\emptyset[/math], accepting an error of [math]O(L^{-0.1})[/math] in the process. We can similarly estimate [math]Y_{a,b,c}[/math] by

[math]{\Bbb E} 1_{(a,b,c) good} {\Bbb E}( 1_{E_1^b \cap E_2^c} | {\mathcal B}^{M'L+l}_\emptyset ) 1_{E_0^a}(x) + O(L^{-0.1}).[/math]

On the other hand, if we again use the fact that (a,b,c) is good, one sees that [math]{\Bbb E}( |f_{0,err}| 1_{E_1^b \cap E_2^c} | {\mathcal B}^{M'L+l}_\emptyset )[/math] is less than [math]\varepsilon^{0.8}[/math] times [math]{\Bbb E}( 1_{E_1^b \cap E_2^c} | {\mathcal B}^{M'L+l}_\emptyset )[/math]. Putting all this together, we see that the contribution of this term to [math]X_{a,b,c}[/math] is [math]O( \varepsilon^{0.8} Y_{a,b,c} ) + O( L^{-0.1} )[/math]. Summing up all the contributions to [math]X_{a,b,c}[/math], we have thus concluded that

[math]X_{a,b,c} \gg \varepsilon^{0.3} Y_{a,b,c} - O(1/F(M))[/math]

(note that the [math]O(L^{-0.1}), O(1/L)[/math] terms are small compared to [math]O(1/F(M))[/math]. Summing over a,b,c, we obtain

[math]\Lambda_0(1_A, 1_A, 1_A) \gg \varepsilon^{0.3} \sum_{a,b,c} Y_{a,b,c} - O_M( 1/F(M)).[/math]

Now, as A has no combinatorial lines, we see that [math]\Lambda_0(1_A,1_A,1_A) = O(1/L) = O(1/F(M))[/math] (say) if n is large enough. We conclude that

[math] \sum_{a,b,c} {\Bbb E} 1_{(a,b,c) good} Y_{a,b,c} \ll_{M,\varepsilon} 1/F(M)[/math].

Now, as we have already done before, we can expand [math]Y_{a,b,c}[/math] as

[math]{\Bbb E} 1_{(a,b,c) good} {\Bbb E}( 1_{E_1^b \cap E_2^c} | {\mathcal B}^{M'L+l}_\emptyset ) 1_{E_0^a} + O(L^{-0.1})[/math]

which we can rearrange slightly as

[math]{\Bbb E} 1_{(a,b,c) good} 1_{E_1^b \cap E_2^c} {\Bbb E}( 1_{E_0}^a | {\mathcal B}^{M'L+l}_\emptyset ) + O(L^{-0.1})[/math]

and rearrange this further as

[math]{\Bbb E} 1_{(a,b,c) good} \Lambda_0( 1_{E_1^b}, 1_{E_2^c}, 1 ) {\Bbb E}( 1_{E_0}^a | {\mathcal B}^{M'L+l}_\emptyset ) + O(L^{-0.1}).[/math]

On the other hand, observe that

[math]\Lambda_2( 1_{E_1^b}, 1_{E_2^c}, 1 ) = {\Bbb E}( {\Bbb E}(1_{E_1^b}|{\mathcal B}^{M'L+l+1}_\emptyset ) {\Bbb E}(1_{E_2^c}|{\mathcal B}^{M'L+l+1}_\emptyset ) )[/math]

and so by using the fact that (a,b,c) is good, we can express [math]Y_{a,b,c}[/math] as

[math]{\Bbb E} 1_{(a,b,c) good} {\Bbb E}( 1_{E_0}^a | {\mathcal B}^{M'L+l}_\emptyset ) {\Bbb E}( 1_{E_1}^b | {\mathcal B}^{M'L+l}_\emptyset ) {\Bbb E}( 1_{E_2}^c | {\mathcal B}^{M'L+l}_\emptyset ) + O(L^{-0.1});[/math]

but as (a,b,c) is good, the three expectation terms here are [math]\gg \varepsilon^{0.1}/2^M[/math], and so

[math] Y_{a,b,c} \gg_{\varepsilon,M} {\Bbb E} 1_{(a,b,c) good} + O(L^{-0.1}) [/math]

and thus for each (a,b,c) we have

[math]{\Bbb E} 1_{(a,b,c) good} \ll_{M,\varepsilon} 1/F(M).[/math] (4)

Thus, the (a,b,c) are almost always bad. (This is the analogue of how a graph with few triangles cannot have three large cells joined by dense regular graphs.)

Now we use this to upper bound the expression

[math]{\Bbb E} \prod_{i=0,1,2} f_{i,U^\perp}[/math] (5)

with the intent of contradicting the large intersection property. We decompose this as

[math]\sum_{(a,b,c)} {\Bbb E} \prod_{i=0,1,2} f_{i,U^\perp} 1_{E_0^a \cap E_1^b \cap E_2^c}[/math].

From (4) we can bound this by

[math]\sum_{(a,b,c)} {\Bbb E} 1_{(a,b,c) bad} \prod_{i=0,1,2} f_{i,U^\perp} 1_{E_0^a \cap E_1^b \cap E_2^c} + O_{M,\varepsilon}(1/F(M))[/math].

Now let's look at the bad (a,b,c). By symmetry it suffices to look at the pairs (a,b) that are 01-bad:

[math]\sum_{(a,b,c)} {\Bbb E} 1_{(a,b) 01-bad} \prod_{i=0,1,2} f_{i,U^\perp} 1_{E_0^a \cap E_1^b \cap E_2^c}[/math].

We sum up in c to get

[math]\sum_{(a,b)} {\Bbb E} 1_{(a,b) 01-bad} \prod_{i=0,1,2} f_{i,U^\perp} 1_{E_0^a \cap E_1^b}[/math].

Now we count the various ways in which (a,b) can be 01-bad.

Suppose first that the 0-cell is too small. We sum in b, bound [math]f_{i,U^\perp}[/math] crudely by 1 and bound this case by

[math]\sum_a {\Bbb E} 1_{{\Bbb E}(1_{E_0^a}|{\mathcal B}^{M'L+l}_\emptyset) \lt \varepsilon^{0.1}/2^M} 1_{E_0^a}.[/math]

The summand is at most [math]\varepsilon^{0.1}/2^M[/math], and the total contribution here is thus [math]O( \varepsilon^{0.1} )[/math].

Similarly for the cases when the 1-cell is too small. Now we look at the case where the 0-cell is too unstable. There are [math]O_M(1)[/math] choices for a,b,c here, and the net contribution of each choice is at most

[math] {\Bbb E} 1_{osc(1_{E_0^a}) \geq L^{-0.1}}.[/math]

But we know that [math]osc(1_{E_0^a})[/math] has an [math]L^2[/math] norm of [math]O_{M,\varepsilon}(L^{-1/2})[/math], thus the total contribution here is at most [math] O_{M,\varepsilon}(L^{-0.1}) = O(\varepsilon^{0.1})[/math]. Similarly for the cases where the 1-cell, 01-cell, or [math]f_{2,err}[/math] is too unstable.

Now we look at the contribution where A is too sparse. We discard [math]f_{1,U^\perp}, f_{2,U^\perp}[/math] and bound [math]f_{0,U^\perp}[/math] by [math]\varepsilon^{0.1}[/math]. Summing in a,b,c, we get a total contribution of [math]O(\varepsilon^{0.1})[/math].

Finally, we look at the contribution where A is too irregular. We know that [math]|f_{2,err}|[/math] has an [math]L^2[/math] (and hence [math]L^1[/math]) norm of [math]O(\varepsilon)[/math]. By Chebyshev, the set where

[math]{\Bbb E}(|f_{2,err}| | {\mathcal B}^{M'L+l}_\emptyset) \gt \varepsilon^{0.9}[/math]

has measure [math]O(\varepsilon^{0.1})[/math]. On any cell of [math]{\mathcal B}^{M'L+l}_\emptyset[/math] outside of this set, an application of the pigeonhole principle shows that the union of all the [math]E_0^a \cap E_1^b[/math] for which

[math]{\Bbb E}(|f_{2,err}| 1_{E_0^a \cap E_1^b} |{\mathcal B}^{M'L+l}_\emptyset) \gt \varepsilon^{0.9} {\Bbb E}(1_{E_0^a \cap E_1^b} |{\mathcal B}^{M'L+l}_\emptyset)[/math]

has a net density of [math]O(\varepsilon^{0.1})[/math] inside this cell. The total contribution to (5) from these cases is thus [math]O(\varepsilon^{0.1})[/math] (here we discard all the [math]f_{i,U^\perp}[/math]).

Putting all this together we obtain

[math]{\Bbb E} \prod_{i=0,1,2} f_{i,U^\perp} \ll \varepsilon^{0.1} + O_{M,\varepsilon}(1/F(M))[/math]

and this contradicts the large intersection property by choosing [math]\varepsilon[/math] sufficiently small depending on [math]\delta[/math], and also choosing F sufficiently rapidly growing depending on [math]\varepsilon[/math].