Passing between measures

From Polymath Wiki
Jump to: navigation, search

In this article we show how to pass between dense sets under the uniform measure on [math][3]^n[/math] and dense sets under the equal-slices measure on [math][3]^n[/math], at the expense of localising to a combinatorial subspace.

Passing between product distributions

Let's first show how to pass between two product distributions. Given [math]p_1 + p_2 + p_3 = 1[/math], let [math]\mu^n_{p_1,p_2,p_3}[/math] denote the product distributions on [math][3]^n[/math] where each coordinate is independently chosen to be [math]j[/math] with probability [math]p_j[/math], [math]j=1,2,3[/math].

Let [math]A \subseteq [3]^n[/math], which we can think of as a set or as an event. We are interested in [math]A[/math]'s probability under different product distributions; say, [math]\mu = \mu^n_{p_1,p_2,p_3}[/math] and [math]\mu' = \mu^n_{p'_1,p'_2,p'_3}[/math].

By definition, [math]|\mu(A) - \mu'(A)| \leq d_{TV}(\mu, \mu')[/math], where [math]d_{TV}[/math] denotes total variation distance. The following basic fact about total variation between product distributions appears, e.g., as Exercise 21 here:

Fact: If we write [math]p'_j = (1+\Delta_j) p_j[/math], then [math]d_{TV}(\mu, \mu')^2 \leq (1+(\sum_{j=1}^3 p_j \Delta_j^2))^n - 1[/math], and hence [math]d_{TV}(\mu,\mu') \leq \sqrt{\sum_{j=1}^3 p_j \Delta_j^2}\sqrt{n}[/math].

A product distribution slightly corrupted

Suppose [math]\mu = \mu_{p_1,p_2,p_3}[/math] is our basic product measure, and [math]\mu' = \mu_{p_1',p_2',p_3'}[/math] is defined by [math](p_1',p_2',p_3') = (1-\epsilon)(p_1,p_2,p_3) + \epsilon(q_1,q_2,q_3)[/math]. Then one can calculate that the [math]\Delta_i[/math] above equals [math]-\epsilon(1-q_i/p_i)[/math], and hence one gets:

Proposition: [math]d_{TV}(\mu, \mu') \leq a(p,q) \cdot \epsilon\sqrt{n}[/math], where
[math]a(p,q) = \sqrt{-1 + \sum_{i=1}^3 q_i^2/p_i}[/math].

One can bound [math]a(p,q)[/math] crudely by

[math]a(p,q) = \sqrt{-1 + \sum_{i=1}^3 q_i^2/p_i} \leq \sqrt{\sum_{i=1}^3 1/p_i} \leq \sum_{i=1}^3 1/\sqrt{p_i}[/math].

Recall that the Equal Slices distribution [math]\overline{\nu}[/math] is given by first choosing [math](p_1, p_2, p_3)[/math] from the simplex, and then drawing from the product distribution [math]\mu_{p_1,p_2,p_3}[/math]. Since

[math]\int_{p_1+p_2+p_3 = 1} (\sum_{i=1}^3 1/\sqrt{p_i}) = O(1)[/math]

(it's [math]48/5[/math], I believe), we conclude the following:

Proposition: Let [math]\gamma[/math] be the distribution on [math][3]^n[/math] given by the following procedure: (a) pick a random subset [math]S \subseteq [n][/math] by including each coordinate with probability [math]\epsilon[/math]; (b) draw from the product distribution [math]\mu_{q_1, q_2, q_3}[/math] on the [math]S[/math] coordinates; (c) draw from [math]\mu_{p_1, p_2, p_3}[/math] on the [math][n] \setminus S[/math] coordinates. Then [math]d_{TV}(\overline{\nu}, \gamma) \leq O(\epsilon \sqrt{n})[/math].
Corollary: The same would be true if in (b) we drew from Equal Slices on [math][3]^S[/math]; this is because Equal Slices is a mixture of product distributions.

In fact, I'm quite certain it would be true if we put any distribution whatsoever on the [math]S[/math] coordinates, since [math]a(p,q)[/math] could be upper-bounded independently of the [math]q_i[/math]'s. If we need this later, we can fill in the details.

Passing from uniform to equal-slices

We will be interested in the case when [math](p_1,p_2,p_3) = (\frac{1}{3}, \frac{1}{3}, \frac{1}{3})[/math] generates the uniform distribution and [math](p'_1, p'_2, p'_3)[/math] is a slight mixture with another distribution. Specifically, say

Definition: [math](p'_1, p'_2, p'_3) = (1-\epsilon)(\frac{1}{3}, \frac{1}{3}, \frac{1}{3}) + \epsilon(q_1, q_2, q_3)[/math].

Then we can calculate:

[math]\sum_{j=1}^3 p_j \Delta_j^2 = 3(q_1^2 + q_2^2 + q_3^2 - 1/3) \epsilon^2 \leq 3\epsilon^2.[/math]


Proposition: In this scenario, [math]d_{TV}(\mu, \mu') \leq \sqrt{3} \epsilon\sqrt{n}[/math] independently of [math](q_1, q_2, q_3)[/math].

Recall that equal-slices measure is gotten by first drawing [math](q_1, q_2, q_3)[/math] from the simplex and then drawing from the product distribution generated by [math]\mu^1_{q_1,q_2,q_3}[/math]. Since the above proposition holds for every choice of [math](q_1, q_2, q_3)[/math], it holds when we average these choices over the simplex. It would also hold if we averaged over [math](q_1, q_2, 0)[/math] drawn from the 2-simplex [math]q_1 + q_2 = 1[/math]. We conclude:

Theorem: Suppose [math]\lambda[/math] is the distribution on [math][3]^n[/math] generated as follows: First we draw a random restriction in which each coordinate is free with probability [math]\epsilon[/math] and fixed uniformly with probability [math]1-\epsilon[/math]. Next, we draw the free coordinates from the equal-slices measure. Then [math]d_{TV}(\mathrm{uniform}, \lambda) \leq \sqrt{3}\epsilon \sqrt{n}[/math]. Furthermore, this is also true for [math]\lambda'[/math], defined in the same way except that the free coordinates are drawn form the equal-slices measure on [math][2]^n[/math].

Corollary: Suppose we change [math]\lambda[/math] slightly by also conditioning on at least [math]\epsilon n/2[/math] coordinates being left free initially and hence drawn from an equal-slices distribution. Then [math]d_{TV}(\mathrm{uniform}, \lambda) \leq \sqrt{3}\epsilon \sqrt{n} + \exp(-\Omega(\epsilon n))[/math]. The analogous statement is true for [math]\lambda'[/math].

Proof: This follows because the probability of getting too few free coordinates is at most [math]\exp(-\Omega(\epsilon n))[/math], by a standard large-deviation bound.

Passing from equal-slices to uniform

This is a bit easier, which is convenient as we'll want to prove something stronger.

Suppose [math]A \subseteq [3]^n[/math] has [math]\Pr_{\nu}[A] = \delta[/math], where again [math]\nu[/math] denotes equal-slices measure. Recall that we can think of [math]\nu[/math] as first choosing [math](p_1,p_2,p_3)[/math] from the simplex and then drawing from the product distribution [math]\mu^n_{p_1,p_2,p_3}[/math]. The following is easy to check:

Fact: Any distribution [math]\mu^1_{p_1,p_2,p_3}[/math] on [math][3][/math] can be written as a mixture distribution [math]\alpha \mu^1_{1/3,1/3,1/3} + (1-\alpha) \lambda[/math], where [math]\alpha = 3\min\{p_1,p_2,p_3\}[/math] and [math]\lambda[/math] is some other distribution.

Corollary: We can express a draw from [math]\mu^n_{p_1,p_2,p_3}[/math] as: (i) forming a set [math]S \subseteq [n][/math] by including each coordinate with probability [math]1-\alpha[/math], independently; (ii) fixing the coordinates in [math]S[/math] to some string [math]x[/math], according to the product distribution [math]\lambda[/math]; (iii) drawing [math]y[/math] uniformly at random from the combinatorial subspace formed by [math](x,S)[/math].

Hence, using the notation [math]A_x[/math] for the restriction of [math]A[/math] to the combinatorial subspace defined by fixing [math]x[/math] into the coordinates [math]S[/math], we have

[math]\delta = \Pr_{\nu}[A] = \mathbf{E}_{p_1+p_2+p_3 = 1} \mathbf{E}_{S,x} \Pr_{y \text{ uniform}}[y \in A_{x}][/math].

Hence for any [math]\eta \gt 0[/math] we have that with probability at least [math]\eta[/math] over the formation of the combinatorial subspace determined by [math]x[/math], the uniform measure of [math]A_x[/math] is at least [math]\delta - \eta[/math].

The only thing that could go somewhat wrong here is that [math]S[/math] might be all of [math][n][/math], or nearly so. But this is quite unlikely. Specifically, one easily calculates that the probability that [math]\min\{p_1,p_2,p_3\} \leq \eta/8[/math] is at most [math]\eta/4[/math]. Assuming this doesn't happen we have [math]\alpha \geq (3/8)\eta[/math]; thus we expect the number of free coordinates in our combinatorial subspace to be [math](3/8)\eta n[/math], and it will be at least [math](\eta/4) n[/math] except with probability [math]\exp(-\Omega(\eta n))[/math]. Overall, the probability that everything goes right is at least [math]\eta - \eta/4 - \exp(-\Omega(\eta n))[/math]. This is positive if [math] \eta \gg (\log n)/n[/math].

We conclude:

Theorem: Suppose the set [math]A[/math] has equal-slices density [math]\delta[/math]. Assume that [math]\eta \geq O(\log n / n)[/math]. Then there exists a combinatorial subspace with at least [math](\eta/4) n[/math] free coordinates on which [math]A[/math] has uniform density at least [math]\delta - \eta[/math].

Relative density version

It will useful to have also a "relative density" version of this:

Theorem: Let [math]\nu[/math] denote equal-slices measure and let [math]\mu[/math] denote uniform measure. Suppose that [math]A \subseteq B \subseteq [3]^n[/math], with [math]\nu(B) = \gamma[/math], [math]\nu(A) \geq \delta \nu(B)[/math]. Assume [math]\eta[/math] is such that [math]\eta \gamma \geq O(\log n / n)[/math]. Then there exists a combinatorial subspace with at least [math](\eta \gamma / 4) n[/math] free coordinates on which [math]\mu(B) \geq \eta \gamma/2[/math] and [math]\mu(A) \geq (\delta - \eta) \mu(B)[/math].

Proof: As in the previous section, think of [math]\nu[/math] as first drawing the combinatorial subspace defined by [math]x[/math] and then drawing from [math]\mu[/math] on this subspace. With regard to the draw of [math]x[/math], let [math]H[/math] event that either [math]\mu(B_x) \leq \eta \gamma/2[/math] or [math]x[/math] leaves fewer than [math](\eta \gamma/4) n[/math] coordinates free. We have

[math]\delta \gamma \leq \mathbf{E}_x[\mu(A_x)] = \Pr[H] \cdot \mathbf{E}_x[\mu(A_x) \mid H] + \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(A_x) \mid \overline{H}].[/math]

The first summand on the right contributes at most [math]\eta \gamma/2[/math] (from when [math]\mu(B_x) \leq \eta \gamma/2[/math]) plus [math]\eta\gamma/4 + \exp(-\Omega(\eta \gamma n)) \leq \eta \gamma/2[/math] (from when [math]x[/math] leaves too few coordinates free, by the same argument as in the previous section). Subtracting [math]\eta \gamma[/math] from both sides we get

[math](\delta - \eta) \gamma \leq \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(A_x) \mid \overline{H}].[/math]

But since [math]\gamma = \mathbf{E}_x[\mu(B_x)] \geq \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(B_x) \mid \overline{H}][/math], we have

[math]\Pr[\overline{H}] \cdot \mathbf{E}_x[(\delta - \eta) \mu(B_x) \mid \overline{H}] \leq \Pr[\overline{H}] \cdot \mathbf{E}_x[\mu(A_x) \mid \overline{H}].[/math]

Clearly then there exists some [math]x[/math] for which both [math]H[/math] does not occur and

[math]\mu(A_x) \geq (\delta - \eta) \mu(B_x)[/math].

This completes the proof.

An older version of passing from equal-slices to uniform

Suppose we define [math]\nu[/math] to be the distribution on [math][3]^n[/math] gotten by first drawing [math](p_1, p_2, p_3)[/math] from the simplex and then drawing from [math]\mu^n_{p_1,p_2,p_3}[/math].

Fact: The distribution [math]\nu[/math] is the equal-slices measure on [math][3]^n[/math].

This fact is a consequence of the "type 1 Dirichlet integral".

Next, define [math]\nu'[/math] as follows: first draw [math](p_1,p_2,p_3)[/math] from the simplex, then draw from [math]\mu^n_{p'_1,p'_2,p'_3}[/math], where [math](p'_1, p'_2, p'_3) = (1 - \epsilon)(p_1, p_2, p_3) + \epsilon(\frac{1}{3}, \frac{1}{3}, \frac{1}{3})[/math].

Using the Fact in the first section, we get

Proposition: [math]d_{TV}(\mu,\mu') \leq \sqrt{c(p_1,p_2,p_3)} \cdot \epsilon \sqrt{n}[/math], where

[math]c(p_1,p_2,p_3) = \frac{1}{9p_1} + \frac{1}{9p_2} + \frac{1}{9p_3} - 1[/math].

Clearly then,

[math]d_{TV}(\nu, \nu') \leq \left(\int_{p_1+p_2+p_3 = 1} \sqrt{c(p_1,p_2,p_3)}\right) \cdot \epsilon \sqrt{n}[/math]

It is elementary to check that the integral is finite; in fact, it should be [math]O(\sqrt{k})[/math] when we have the analogous situation over [math][k]^n[/math]. So:

Proposition 2: [math]d_{TV}(\nu, \nu') \leq O(\epsilon \sqrt{n})[/math].

We can view a draw from [math]\nu'[/math] as follows: First, we choose a random set of wildcards [math]S[/math] by including each coordinate independently with probability [math]\epsilon[/math]. Next, choose [math](p_1,p_2,p_3)[/math] at random from the simplex. Next, choose a string [math]x \in [3]^{[n] \setminus S}[/math] from [math]\mu_{p_1,p_2,p_3}[/math]. Finally, choose [math]y \in [3]^n[/math] uniformly, and output [math](x,y)[/math].

By Proposition 2,

[math]\Pr_\nu[A] - O(\epsilon \sqrt{n}) \leq \Pr_{\nu'}[A] = \mathbf{E}_{S} \mathbf{E}_{p,x} \Pr_y [A].[/math]

By a large-deviation bound, the probability that [math]|S| \lt \epsilon n / 2[/math] is at most [math]\exp(-\Omega(\epsilon n))[/math]. Renaming [math]\epsilon[/math] as [math]2\epsilon[/math] for simplicity, we conclude that

[math]\Pr_\nu[A] - O(\epsilon \sqrt{n}) - \exp(-\Omega(\epsilon n)) \leq \Pr_{\nu'}[A] = \mathbf{E}_{S}[\mathbf{E}_{p,x} \Pr_y [A] \mid |S| \geq \epsilon n].[/math]


Theorem: (Weaker than in the previous section.) Suppose [math]A[/math] has equal-slices density [math]\delta[/math]. Then for any [math]\epsilon[/math], there exists a combinatorial subspace [math](x,S)[/math] with [math]|S| \geq \epsilon n[/math] on which [math]A[/math] has uniform-distribution density at least [math]\delta - O(\epsilon \sqrt{n}) - \exp(-\Omega(\epsilon n))[/math].

By the way, f this is done over [math][k]^n[/math], we should only have to change [math]O(\epsilon \sqrt{n})[/math] to [math]O(\epsilon \sqrt{kn})[/math].