# Proof of DHJ(3) via density-increment

## Introduction

This article is intended to contain a proof outline of DHJ($3$) via the density-increment method. The proofs of each step in the outline will appear in separate articles. The goal is to rigorously codify the "second outline" proof. A secondary goal is to write the proof in such a way that it becomes "easy" to see the generalization to DHJ($k$). Each step of the outline has a "Proofread By" section to which people can add their names.

## Notation and definitions

We'll call the main probability distribution we work with the Pólya Urn distribution. As a distribution on $[k]^n$, we'll write it as $\nu_k^n$, or as $\nu_k$ or $\nu$ if it's clear from context. (Note: even if one only wants to prove DHJ($3$), one needs to define $\nu_k^n$ for general $k$.) One way to define it is as follows: Order the coordinates $[n]$ in a list according to a random permutation. Now choose $k-1$ distinct "partition points" uniformly at random from the $\binom{n+1}{k-1}$ possibilities. This partitions the list of coordinates into $k$ sublists; take the $1$st sublist of coordinates to be $1$'s, the $2$nd sublist to be $2$'s, etc. Note that any string drawn from the Pólya Urn distribution is nondegenerate, meaning it contains at least one of each character in $$.

A closely related distribution is called the Equal Slices distribution. We will denote this distribution by $\overline{\nu}_k^n$. It has the same definition as the Pólya Urn definition just given, except that the $k-1$ partition points are "allowed to be equal"; more precisely, we choose the $1$st partition point uniformly from the $n+1$ possibilities, then we choose the $2$nd partition point uniformly from the $n+2$ possibilities, etc. It should be clear from these definitions that the total variation distance between $\nu_k^n$ and $\overline{\nu}_k^n$ is small. Here is a proof that it is $O(k^2/n)$.

We identify combinatorial lines in $^n$ with strings in $^n$, where the $4$ character is treated as identifying the "wildcard" set. Similarly, we identify $2$-dimensional combinatorial subspaces in $^n$ with strings in $^n$, where the characters $4$ and $5$ identify the $2$ wildcard sets. Etc. Note that a combinatorial line in $^n$ drawn from $\nu_4^n$ is automatically nondegenerate. Henceforth, whenever we say "combinatorial line" or "combinatorial subspace" without qualification, the object is assumed to be nondegenerate.

Although we will prove DHJ($3$) for sets that are dense under $\nu_3$, this also proves it for sets that are dense under the usual uniform distribution; this should be explicated in the article on passing between measures.

## Proof Outline

Here is the sequence of statements we prove:

1. Dense sets in $^n$ contain lines (i.e., DHJ($2$)).

2. Dense sets in $^n$ contain large subspaces.

3. Dense ij-insensitive sets in $^n$ contain large subspaces.

4. Dense $ij$-insensitive sets in $^n$ are large-subspace rich.

5. Dense $ij$-insensitive sets in $^n$ are large-subspace partitionable.

6. Dense intersections of $ij$-insensitive sets in $^n$ are large-subspace partitionable.

7. Line-free sets in $^n$ have density increments on dense intersections of $ij$-insensitive sets.

8. Line-free sets in $^n$ have density increments on large subspaces.

9. Dense sets in $^n$ contain lines (i.e., DHJ($3$)).

### 1. Sets in $^n$ have lines

This is simply Sperner's Theorem. Our version, with the Pólya Urn distribution, is:

Theorem 1: Let $A \subseteq ^n$ have $\nu_2$-density at least $\delta$. Assume that $n \geq n_1(\delta)$, where $n_1(\delta) = O(1/\delta^2)$. Then $A$ contains a combinatorial ($2$-)line.
Proof of Sperner's Theorem. For now, the proof more or less appears in the article on Sperner's Theorem; we might want to use the writeup ideas in the current article on Equal Slices. The easiest proof involves first passing from Pólya to Equal Slices. I think you only need $n_1(\delta) \gt 1/\delta^2 + 1$ or something.

### 2. Sets in $^n$ have subspaces

The next step is to extend DHJ($2$) from lines to subspaces of arbitrary dimension.

Lemma 2: Let $A \subseteq ^n$ have $\nu_2$-density at least $\delta$, and let $d \geq 1$. Assume that $n \geq n_2(\delta,d)$. Then $A$ contains a $d$-dimensional combinatorial subspace. Here $n_2(\delta,d)$ probably only needs to be $O(1/\delta)^{2^{d+1}}$

This sort of result has been shown for the uniform distribution case by [], from which one can easily conclude the Pólya distribution case by passing between measures. Alternatively, one can prove the result with a (much) worse bound on $n_2$ using an argument which generalizes to $[k]^n$. A few technical bits need to be filled into this argument; namely, showing that if one does Pólya on some small random subset $S \subseteq [n]$, and then combines this with an independent Pólya draw on $[n] \setminus S$, the overall draw is very close in total variation distance to a normal Pólya draw. This is proved rather explicitly in the passing between measures article. (Presumably there is also literature from the Pólya Urn studiers on this, too.)

### 3. $ij$-insensitive sets in $^n$ have subspaces

Let $i,j \in [k]$ be distinct characters. We define $A \subseteq [k]^n$ to be an "$ij$-insensitive set" if for all strings $x$, changing $i$'s to $j$'s in $x$ or vice versa does not change whether or not $x \in A$.

An $23$-insensitive in $^n$ is elsewhere on the blog called a "$1$-set", since it depends only on the set of coordinates where the value 1 is taken. It is a special case of a special set of complexity $1$. See also this article discussing the complexity of sets.

The statement we prove here is:

Lemma 3: Let $A \subseteq ^n$ be an $ij$-insensitive set with $\nu_3$-density at least $\delta$, and let $d \geq 1$. Assume that $n \geq n_3(\delta,d) = \sqrt{2/\delta} \cdot n_2(\delta/2, d)$. Then $A$ contains a $d$-dimensional combinatorial subspace.

We sketch the proof here, for now.

Proof: (Sketch.) Assume without loss of generality that $i = 2$, $j = 3$. We can think of a draw $x \sim \nu_3^n$ by first conditioning on the set of coordinates $S$ where $x$ has its $3$'s, and then drawing the remainder of the string as $y \sim \nu_2^{|\overline{S}|}$, where $\overline{S} = [n] \setminus S$. (This requires a quick justification.) Inventing some notation, we have $\mathbf{E}_S[\nu_2(A_S)] = \delta$. Further, it should be easy to check that $\mathbf{Pr}[|\overline{S}| \lt\gamma n] \leq \gamma^2$ (or perhaps only a tiny bit higher). Hence $\mathbf{E}_S[\nu_2(A_S) \mid |\overline{S}| \geq \gamma n] \geq \delta - \gamma^2.$ We set $\gamma = \sqrt{\delta/2}$ and conclude that there exists a particular set of $3$-coordinates $S_0$ such that $\nu_2(A_{S_0}) \geq \delta/2$ and $|\overline{S_0}| \geq \sqrt{\delta/2} n$. By choice of $n_2$, this lets us apply Lemma 2 to conclude that $A_{S_0} \subseteq ^{S_0}$ contains a $d$-dimensional combinatorial subspace; call it $\sigma$.
Pulling $\sigma$ back into $^n$ by putting $3$'s into the $S$ coordinates, we get a $d$-dimensional "$2$-subspace" in $A \subseteq ^n$. But in fact, all of the strings necessary to fill this out into a complete $d$-dimensional subspace in $^n$ must also be present in $A$, because $A$ is $23$-insensitive. $\Box$ (somewhat poorly explained)

### 4. $ij$-insensitive sets are subspace rich

Lemma 4: Let $A \subseteq ^n$ be an $ij$-insensitive set with $\nu$-density at least $\delta$, and let $d \geq 1$. Assume that $n \geq n_4(\delta,d) = n_3(\delta/2,d)$. Then a random $d$-dimensional subspace drawn from $\nu^n_{3+d}$ is in $A$ with probability at least $\eta_4(\delta, d) = \delta/(O(d))^{n_4(\delta,d)}$.

This is sketched originally here. Here is a more complete argument:

Proof: (Sketch.) For any $3 \leq m \leq n$ one can draw a string $h \sim \nu_3^n$ as follows: First, draw $f \sim \nu_m^n$; then independently draw $g \sim \nu_3^m$; finally, set $h = g \circ f$. This notation means that $h_i = g_{f_i}$. The fact that this indeed gives the distribution $\nu_3^n$ is justified in [document], but should be simplified and ported to the wiki.
By assumption, $\mathbf{Pr}[h \in A] \geq \delta$; hence with probability at least $\delta/2$ over the choice of $f$ we get that $A_f \subseteq ^m$ has $\nu_3$-density at least $\delta/2$. Here $A_f$ of course denotes $\{g \in ^m : g \circ f \in A$. Crucially, $A_f$ is an $ij$-insensitive set because $A$. (This takes a slight bit of reflection but is, I think, easily confirmed.) Call an $f$ "good" if indeed $\nu_3(A_f) \geq \delta/2$.
We therefore select $m = n_3(\delta/2,d)$ (which is allowable as $n \geq n_3(\delta/2,d)$) and use Lemma 3 to conclude that for each good $f$, the set $A_f$ contains some $d$-dimensional subspace. In these cases, a randomly chosen subspace $\sigma \sim [3+d]^m$ is in $A_f$ with some positive probability. At this point, we should calculate the least probability of any outcome under $\nu_k^n$; I believe it is $\frac{\Theta(1/k)^{k/2}}{n^{(k-1)/2} \cdot k^n}$. Lower-bounding this crudely, we have that $\sigma \sim [3+d]^m$ is in $A_f$ with probability at least $(O(d))^{-m}$.
So overall we have that for $f \sim \nu_3^m$ and $\sigma \sim [3+d]^m$, the probability of $\sigma \circ f$ being entirely in $A$ is at least $\delta/(O(d))^m = \delta/(O(d))^{n_3(\delta/2,d)}$. But $\sigma \circ f$ is distributed as $\nu_{3+d}^n$, completing the proof. $\Box$

### 5. $ij$-insensitive sets are subspace partitionable

Definition: A set $A \subseteq ^n$ is "$(\gamma,d)$-subspace partitionable" if it can be written as a disjoint union of $d$-dimensional combinatorial subspaces, along with a single "error" set $E$ satisfying $\nu(E)/\nu(A) \leq \gamma$.
Lemma 5: Let $A \subseteq ^n$ be an $ij$-insensitive set with $\nu$-density at least $\delta$, let $d \geq 1$, and let $\gamma \in (0,1)$ be a parameter. Assume that $n \geq n_5(\delta,\gamma,d) = XXX$. Then $A$ is $(\gamma,d)$-subspace partitionable.

This is currently written fairly thoroughly here, although the parameters need to be worked out to fill in the "XXX" in the statement.

### 6. Intersections of $ij$-insensitive sets are subspace partitionable

Currently written here.

### 7. Line-free sets increment on intersections of $ij$-insensitive sets

Written mostly completely here, although some of the "passing between measures" tricks that were done for the uniform distribution should be done for the Pólya distribution instead.