# Higher-dimensional DHJ numbers

For any n, k let [math]c_{n,k}[/math] denote the cardinality of the largest subset of [math][k]^n[/math] that does not contain a combinatorial line. When k=3, the quantity [math]c_{n,k} = c_n[/math] is studied for instance in this page. The density Hales-Jewett theorem asserts that for any fixed k, [math]\lim_{n \to \infty} c_n / k^n = 0[/math].

A computer search has found the following [math]c_n[/math] values for different values of dimension n and edgelength k. Several of these values reach the upper bound of [math](k-1)k^{n-1}[/math].

n\k | 2 | 3 | 4 | 5 | 6 | 7 |

2 | 2 | 6 | 12 | 20 | 30 | 42 |

3 | 3 | 18 | 48 | 100 | 180 | 294 |

4 | 6 | 52 | 183 | 500 | 1051-1079 | 2058 |

5 | 10 | 150 | 712-732 | 2500 |

We trivially have

- [math]c_{n,1} = 0[/math] for n > 0 (and [math]c_{0,0}=1[/math])

and Sperner's theorem tells us that

- [math]c_{n,2} = \binom{n}{\lfloor n/2\rfloor}[/math].

Now we look at the opposite regime, in which n is small and k is large. We easily have

- [math]c_{0,k} = 1[/math]

and

- [math]c_{1,k} = k-1[/math];

together with the trivial bound

- [math]c_{n+1,k} \leq k c_{n,k}[/math]

this implies that

- [math]c_{n,k} \leq (k-1) k^{n-1}[/math]

for any [math]n \geq 1[/math]. Let us call a pair (n,k) with n > 0 *saturated* if [math]c_{n,k} = (k-1) k^{n-1}[/math], thus there exists a line-free set with exactly one point omitted from every row and column.

**Question**: Which pairs (n,k) are saturated?

From the above discussion we see that (1,k) is saturated for all k >= 1, and (n,1) is (rather trivially) saturated for all n. Sperner's theorem tells us that (n,2) is saturated only for n= 1, 2. Note that if (n,k) is unsaturated then (n',k) will be unsaturated for all n' > n.

## Contents

## (2,k) is saturated when k is at least 1

It is simple to show when restricting to dimension two the maximal set size has to be k(k-1). This can be done by removing the diagonal values 11, 22, 33, …, kk. Since they are in disjoint lines this removal is minimal.

The k missing points are one per line and one per column. So their y-coordinates are a shuffle of their x-coordinates. There are k! rearrangements of the numbers 1 to k. The k points include a point on the diagonal, so this shuffle is not a derangement. There are k!/e derangements of the numbers 1 to k, so k!(1-1/e) optimal solutions

The number of optimal solutions is this sequence.

## (3,k) is saturated when k is at least 3

Let S be a latin square of side k on the symbols 1…k, with colour i in position (i,i) ( This is not possible for k=2 )

Let axis one in S correspond to coordinate 1 in [k]^3, axis two to coordinate 2 and interpret the colour in position (i,j) as the third coordinate. Delete the points so defined.

The line with three wild cards has now been removed. A line with two wildcards will be missing the point corresponding to the diagonal in S. A line with a single wildcard will be missing a point corresponding to an off diagonal point in S.

Something similar should work in higher dimensions if one can find latin cubes etc with the right diagonal properties.

## (n,k) is saturated when all prime divisors of k are at least n

First consider the case when k is prime and at least n: Delete those points whose coordinates add up to a multiple of k. Every combinatorial line has one point deleted, except for the major diagonal of n=k, which has all points deleted.

Now consider for instance the case (n,k) = (4,35). Select one value modulo 35 and eliminate it. Combinatorial lines with one, two, three or four moving coordinates will realize all values modulo 35 as one, two, three, or four are units modulo 35, thus (4,35) is saturated.

The same argument tells us that (n,k) is saturated when all prime divisors of k are at least n.

On the other hand, computer data shows that (4,4) and (4,6) are not saturated.

## (4,k)

There are five types of points: xyzw, xxyz, xxyy, xxxy and xxxx. Let p(xyzw) be the number of points removed whose coordinates are all different, and so on.

There are seven types of line: *xyz, *xxy, *xxx, **xy, **xx, ***x, ****. Enough points must be removed to remove all lines. That leads to the following inequalities

- [math] 4p(xyzw)+2p(xxyz) \ge 4k(k-1)(k-2)[/math]
- [math] 2p(xxyz)+4p(xxyy)+3p(xxxy) \ge 12k(k-1)[/math]
- [math] p(xxxy)+4p(xxxx) \ge 4k [/math]
- [math] p(xxyz)+3p(xxxy) \ge 6k(k-1) [/math]
- [math] 2p(xxyy)+6p(xxxx) \ge 6k [/math]
- [math] p(xxxx) \ge 1[/math]

If (4,k) is saturated, then for some h between 0 and k-1 inclusive the k^3 missing points fall into the following types

- (k-1)(k-2)(k-3) - 6h of type xyzw
- 6(k-1)(k-2) + 12h of type xxyz
- 3(k-1) - 3h of type xxyy
- 4(k-1) - 4h of type xxxy
- 1 + h of type xxxx

## General lower bounds

There are k^{n-1} disjoint lines *abcd..m, so the density of removed points must be at least 1/k, and retained points at most (k-1)/k. If n ≤ p ≤ k then one can get a density of (p-1)/p by deleting points whose coordinates sum to a multiple of p. The lower bound (p-1)/p approaches (k-1)/k as [math]k\rightarrow \infty[/math].

If k is prime and [math]k \ge n[/math], then one can remove all combinatorial lines by deleting all points whose coordinates sum to a multiple of k. So the density of deleted points in the optimal configuration is 1/k when k is prime.

Let p be the smallest prime greater than or equal to both k and n. One can remove all combinatorial lines by deleting all points whose coordinates sum to [math]0\le x\le p-k[/math] (mod p), So the density of deleted points is at most (p-k+1)/p. This approaches zero as [math]k\rightarrow\infty[/math]. For example, the following paper shows there is a prime between x-x^0.525 and x.

Baker, R. C.(1-BYU); Harman, G.(4-LNDHB); Pintz, J.(H-AOS) The difference between consecutive primes. II. Proc. London Math. Soc. (3) 83 (2001), no. 3, 532–562.

I think these results can be used to get lower bounds on lines free sets for large n for all values of k. For any k and any n we can find a prime prelatively close to k^n then we remove the first k+1 values mod p then we pick a value then we remove the must k+1 so we only have k+2, 2k+3, etch the idea is to prevent any two values on a line because two points on a combinatorial line increase by at most k. This has density 1/k so we have a line free density of 1/(k+1).

I think the above bound could possibly be improved. First by getting most of the set concentrated around the point with equal numbers of ones twos and threes or the point with values closes to equality the standard deviation should be something like the square root of n. Then we could apply the near prime with sets c(k^.5 + 1) and get a density of roughly c/k^.5 which I think will be better than the Behrend-Elkin construction as e^-x will eventually be less than 1/x as x increases without limit and the square root of k will increase without limit.