Growth function

The growth function (also called: shatter coefficient or shattering number) measures the richness of a set family. It is especially used in the context of statistical learning theory, where it measures the complexity of a hypothesis class. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties.^[1]

Definitions

Set-family definition

Let $H$ be a set family (a set of sets) and $C$ a set. Their intersection is defined as the following set-family:

H\cap C:=\{h\cap C\mid h\in H\}

The intersection-size (also called the index) of $H$ with respect to $C$ is $|H\cap C|$ . Obviously, if a set $C_{m}$ has $m$ elements then the index is at most $2^{m}$ .

The growth function measures the size of $H\cap C$ as a function of $|C|$ . Formally:

\operatorname {Growth} (H,m):=\max _{C:|C|=m}|H\cap C|

Hypothesis-class definition

Equivalently, let $H$ be a hypothesis-class (a set of binary functions) and $C$ a set with $m$ elements. The restriction of $H$ to $C$ is the set of binary functions on $C$ that can be derived from $H$ :^[2]^: 45

H_{C}:=\{(h(x_{1}),\ldots ,h(x_{m}))\mid h\in H,x_{i}\in C\}

The growth function measures the size of $H_{C}$ as a function of $|C|$ :^[2]^: 49

\operatorname {Growth} (H,m):=\max _{C:|C|=m}|H\cap C|(H,m):=\max _{C:|C|=m}|H_{C}|

Examples

1. The domain is the real line $\mathbb {R}$ . The set-family $H$ contains all the half-lines (rays) from a given number to positive infinity, i.e., all sets of the form $\{x>x_{0}\mid x\in \mathbb {R} \}$ for some $x_{0}\in \mathbb {R}$ . For any set $C$ of $m$ real numbers, the intersection $H\cap C$ contains $m+1$ sets: the empty set, the set containing the largest element of $C$ , the set containing the two largest elements of $C$ , and so on. Therefore: $\operatorname {Growth} (H,m)=m+1$ .^[1]^: Ex.1 The same is true whether $H$ contains open half-lines, closed half-lines, or both.

2. The domain is the segment $[0,1]$ . The set-family $H$ contains all the open sets. For any set $C$ of $m$ real numbers, the intersection $H\cap C$ contains all possible subsets of $C$ . There are $2^{m}$ such subsets, so $\operatorname {Growth} (H,m)=2^{m}$ . ^[1]^: Ex.2

3. The domain is the Euclidean space $\mathbb {R} ^{n}$ . The set-family $H$ contains all the half-spaces of the form: $x\cdot \phi \geq 1$ , where $\phi$ is a fixed vector. Then $\operatorname {Growth} (H,m)=\operatorname {Comp} (n,m)$ , where Comp is the number of number of components in a partitioning of an n-dimensional space by m hyperplanes.^[1]^: Ex.3

4. The domain is the real line $\mathbb {R}$ . The set-family $H$ contains all the real intervals, i.e., all sets of the form $\{x\in [x_{0},x_{1}]|x\in \mathbb {R} \}$ for some $x_{0},x_{1}\in \mathbb {R}$ . For any set $C$ of $m$ real numbers, the intersection $H\cap C$ contains all runs of between 0 and $m$ consecutive elements of $C$ . The number of such runs is ${m+1 \choose 2}+1$ , so $\operatorname {Growth} (H,m)={m+1 \choose 2}+1$ .

Properties

Trivial upper bound

For any finite $H$ :

\operatorname {Growth} (H,m)\leq |H|

since for every $C$ , the number of elements in $H\cap C$ is at most $|H|$ . Therefore, the growth function is mainly interesting when $H$ is infinite.

Exponential upper bound

2. For any nonempty $H$ :

\operatorname {Growth} (H,m)\leq 2^{m}

I.e, the growth function has an exponential upper-bound.

We say that a set-family $H$ shatters a set $C$ if their intersection contains all possible subsets of $C$ , i.e. $H\cap C=2^{C}$ . If $H$ shatters $C$ of size $m$ , then $\operatorname {Growth} (H,C)=2^{m}$ , which is the upper bound.

Polynomial or exponential

The following is a property of the intersection-size:^[1]^: Lem.1

If, for some set $C_{m}$ of size $m$ , and for some number $n\leq m$ , $|H\cap C_{m}|\geq \operatorname {Comp} (n,m)$ -
then, there exists a subset $C_{n}\subseteq C_{m}$ of size $n$ such that $|H\cap C_{n}|=2^{n}$ .

This implies the following property of the Growth function.^[1]^: Th.1 For every family $H$ there are two cases:

The exponential case: $\operatorname {Growth} (H,m)=2^{m}$ identically.
The polynomial case: $\operatorname {Growth} (H,m)$ is majorized by $\operatorname {Comp} (n,m)\leq m^{n}+1$ , where $n$ is the smallest integer for which $\operatorname {Growth} (H,n)<2^{n}$ .

VC dimension

The VC dimension of $H$ is defined according to these two cases:

In the polynomial case, $\operatorname {VCDim} (H)=n-1$ = the largest integer $d$ for which $\operatorname {Growth} (H,d)=2^{d}$ .
In the exponential case $\operatorname {VCDim} (H)=\infty$ .

So $VCDim(H)\geq d$ if-and-only-if $Growth(H,d)=2^{d}$ .

The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether $\operatorname {Growth} (H,d)$ is equal to or smaller than $2^{d}$ , while the growth function tells us exactly how $Growth(H,m)$ changes as a function of $m$ .

Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:^[2]^: 49

If

VCDim(H)=d

, then:

for all

m

:

\operatorname {Growth} (H,m)\leq \sum _{i=0}^{d}{m \choose i}

In particular,

for all

m>d+1

:

\operatorname {Growth} (H,m)\leq (em/d)^{d}=O(m^{d})

so when the VC dimension is finite, the growth function grows polynomially with

m

.

Entropy

While the growth-function is related to the maximum intersection-size, the entropy is related to the average intersection size:^[1]^{: 272–273}

\operatorname {Entropy} (H,m)=E_{|C_{m}|=m}{\big [}\log _{2}(|H\cap C_{m}|){\big ]}

The intersection-size has the following property. For every set-family $H$ :

|H\cap (C_{1}\cup C_{2})|\leq |H\cap C_{1}|\cdot |H\cap C_{2}|

Hence:

\operatorname {Entropy} (H,m_{1}+m_{2})\leq \operatorname {Entropy} (H,m_{1})+\operatorname {Entropy} (H,m_{2})

Moreover, the sequence $\operatorname {Entropy} (H,m)/m$ converges to a constant $c\in [0,1]$ when $m\to \infty$ .

Moreover, the random-variable $\log _{2}{|H\cap C_{m}|/m}$ is concentrated near $c$ .

Applications in probability theory

Let $\Omega$ be a set on which a probability measure $Pr$ is defined. Let $H$ be family of subsets of $\Omega$ (= a family of events).

Suppose we choose a set $C_{m}$ that contains $m$ elements of $\Omega$ , where each element is chosen at random according to the probability measure $P$ , independently of the others (i.e, with replacements). For each event $h\in H$ , we compare the following two quantities:

Its relative frequency in $C_{m}$ , i.e, $|h\cap C_{m}|/m$ ;
Its probability $Pr[h]$ .

We are interested in the difference, $\delta (h,C_{m}):={\big |}|h\cap C_{m}|/m-Pr[h]{\big |}$ . This difference satisfies the following upper bound:^[1]^: Th.2

Pr{\big [}\forall h\in H:\delta (h,C_{m})\leq \epsilon {\big ]}~~~~>~~~~1-4\cdot \operatorname {Growth} (H,2m)\cdot \exp(-\epsilon ^{2}\cdot m/8)

In words: the probability that for all events in $H$ , the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of $H$ .

A corollary of this is that, iff the growth function is polynomial in $m$ (i.e, there exists some $n$ such that $\operatorname {Growth} (H,m)\leq m^{n}+1$ ), then the above probability approaches 1 as $m\to \infty$ . I.e, the family $H$ enjoys uniform convergence to probability.

References

^ ^a ^b ^c ^d ^e ^f ^g ^h Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9.
^ ^a ^b ^c Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135.

[vc-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9.

[book14-2] Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135.

[1]

[2]