Growth function

The growth function, also called the shatter coefficient or the shattering number, measures the richness of a set family. It is especially used in the context of statistical learning theory, where it measures the complexity of a hypothesis class. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties.^[1] It is a basic concept in machine learning.^[2] ^[3]

Definitions

Set-family definition

Let $H$ be a set family (a set of sets) and $C$ a set. Their intersection is defined as the following set-family:

H\cap C:=\{h\cap C\mid h\in H\}

The intersection-size (also called the index) of $H$ with respect to $C$ is $|H\cap C|$ . If a set $C_{m}$ has $m$ elements then the index is at most $2^{m}$ . If the index is exactly 2^m then the set $C$ is said to be shattered by $H$ , because $H\cap C$ contains all the subsets of $C$ , i.e.:

|H\cap C|=2^{|C|},

The growth function measures the size of $H\cap C$ as a function of $|C|$ . Formally:

\operatorname {Growth} (H,m):=\max _{C:|C|=m}|H\cap C|

Hypothesis-class definition

Equivalently, let $H$ be a hypothesis-class (a set of binary functions) and $C$ a set with $m$ elements. The restriction of $H$ to $C$ is the set of binary functions on $C$ that can be derived from $H$ :^[3]^: 45

H_{C}:=\{(h(x_{1}),\ldots ,h(x_{m}))\mid h\in H,x_{i}\in C\}

The growth function measures the size of $H_{C}$ as a function of $|C|$ :^[3]^: 49

\operatorname {Growth} (H,m):=\max _{C:|C|=m}|H\cap C|(H,m):=\max _{C:|C|=m}|H_{C}|

Examples

1. The domain is the real line $\mathbb {R}$ . The set-family $H$ contains all the half-lines (rays) from a given number to positive infinity, i.e., all sets of the form $\{x>x_{0}\mid x\in \mathbb {R} \}$ for some $x_{0}\in \mathbb {R}$ . For any set $C$ of $m$ real numbers, the intersection $H\cap C$ contains $m+1$ sets: the empty set, the set containing the largest element of $C$ , the set containing the two largest elements of $C$ , and so on. Therefore: $\operatorname {Growth} (H,m)=m+1$ .^[1]^: Ex.1 The same is true whether $H$ contains open half-lines, closed half-lines, or both.

2. The domain is the segment $[0,1]$ . The set-family $H$ contains all the open sets. For any finite set $C$ of $m$ real numbers, the intersection $H\cap C$ contains all possible subsets of $C$ . There are $2^{m}$ such subsets, so $\operatorname {Growth} (H,m)=2^{m}$ . ^[1]^: Ex.2

3. The domain is the Euclidean space $\mathbb {R} ^{n}$ . The set-family $H$ contains all the half-spaces of the form: $x\cdot \phi \geq 1$ , where $\phi$ is a fixed vector. Then $\operatorname {Growth} (H,m)=\operatorname {Comp} (n,m)$ , where Comp is the number of number of components in a partitioning of an n-dimensional space by m hyperplanes.^[1]^: Ex.3

4. The domain is the real line $\mathbb {R}$ . The set-family $H$ contains all the real intervals, i.e., all sets of the form $\{x\in [x_{0},x_{1}]|x\in \mathbb {R} \}$ for some $x_{0},x_{1}\in \mathbb {R}$ . For any set $C$ of $m$ real numbers, the intersection $H\cap C$ contains all runs of between 0 and $m$ consecutive elements of $C$ . The number of such runs is ${m+1 \choose 2}+1$ , so $\operatorname {Growth} (H,m)={m+1 \choose 2}+1$ .

Polynomial or exponential

The main property that makes the growth function interesting is that it can be either polynomial or exponential - nothing in-between.

The following is a property of the intersection-size:^[1]^: Lem.1

If, for some set $C_{m}$ of size $m$ , and for some number $n\leq m$ , $|H\cap C_{m}|\geq \operatorname {Comp} (n,m)$ -
then, there exists a subset $C_{n}\subseteq C_{m}$ of size $n$ such that $|H\cap C_{n}|=2^{n}$ .

This implies the following property of the Growth function.^[1]^: Th.1 For every family $H$ there are two cases:

The exponential case: $\operatorname {Growth} (H,m)=2^{m}$ identically.
The polynomial case: $\operatorname {Growth} (H,m)$ is majorized by $\operatorname {Comp} (n,m)\leq m^{n}+1$ , where $n$ is the smallest integer for which $\operatorname {Growth} (H,n)<2^{n}$ .

Other properties

Trivial upper bound

For any finite $H$ :

\operatorname {Growth} (H,m)\leq |H|

since for every $C$ , the number of elements in $H\cap C$ is at most $|H|$ . Therefore, the growth function is mainly interesting when $H$ is infinite.

Exponential upper bound

For any nonempty $H$ :

\operatorname {Growth} (H,m)\leq 2^{m}

I.e, the growth function has an exponential upper-bound.

We say that a set-family $H$ shatters a set $C$ if their intersection contains all possible subsets of $C$ , i.e. $H\cap C=2^{C}$ . If $H$ shatters $C$ of size $m$ , then $\operatorname {Growth} (H,C)=2^{m}$ , which is the upper bound.

Cartesian intersection

Define the Cartesian intersection of two set-families as:

H_{1}\bigotimes H_{2}:=\{h_{1}\cap h_{2}\mid h_{1}\in H_{1},h_{2}\in H_{2}\}

.

Then:^[2]^: 57

\operatorname {Growth} (H_{1}\bigotimes H_{2},m)\leq \operatorname {Growth} (H_{1},m)\cdot \operatorname {Growth} (H_{2},m)

Union

For every two set-families:^[2]^: 58

\operatorname {Growth} (H_{1}\cup H_{2},m)\leq \operatorname {Growth} (H_{1},m)+\operatorname {Growth} (H_{2},m)

VC dimension

The VC dimension of $H$ is defined according to these two cases:

In the polynomial case, $\operatorname {VCDim} (H)=n-1$ = the largest integer $d$ for which $\operatorname {Growth} (H,d)=2^{d}$ .
In the exponential case $\operatorname {VCDim} (H)=\infty$ .

So $\operatorname {VCDim} (H)\geq d$ if-and-only-if $\operatorname {Growth} (H,d)=2^{d}$ .

The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether $\operatorname {Growth} (H,d)$ is equal to or smaller than $2^{d}$ , while the growth function tells us exactly how $\operatorname {Growth} (H,m)$ changes as a function of $m$ .

Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:^[3]^: 49

If

\operatorname {VCDim} (H)=d

, then:

for all

m

:

\operatorname {Growth} (H,m)\leq \sum _{i=0}^{d}{m \choose i}

In particular,

for all

m>d+1

:

\operatorname {Growth} (H,m)\leq (em/d)^{d}=O(m^{d})

so when the VC dimension is finite, the growth function grows polynomially with

m

.

This upper bound is tight, i.e, for all $m>d$ there exists $H$ with VC dimension $d$ such that:^[2]^: 56

\operatorname {Growth} (H,m)=\sum _{i=0}^{d}{m \choose i}

Entropy

While the growth-function is related to the maximum intersection-size, the entropy is related to the average intersection size:^[1]^{: 272–273}

\operatorname {Entropy} (H,m)=E_{|C_{m}|=m}{\big [}\log _{2}(|H\cap C_{m}|){\big ]}

The intersection-size has the following property. For every set-family $H$ :

|H\cap (C_{1}\cup C_{2})|\leq |H\cap C_{1}|\cdot |H\cap C_{2}|

Hence:

\operatorname {Entropy} (H,m_{1}+m_{2})\leq \operatorname {Entropy} (H,m_{1})+\operatorname {Entropy} (H,m_{2})

Moreover, the sequence $\operatorname {Entropy} (H,m)/m$ converges to a constant $c\in [0,1]$ when $m\to \infty$ .

Moreover, the random-variable $\log _{2}{|H\cap C_{m}|/m}$ is concentrated near $c$ .

Applications in probability theory

Let $\Omega$ be a set on which a probability measure $\Pr$ is defined. Let $H$ be family of subsets of $\Omega$ (= a family of events).

Suppose we choose a set $C_{m}$ that contains $m$ elements of $\Omega$ , where each element is chosen at random according to the probability measure $P$ , independently of the others (i.e, with replacements). For each event $h\in H$ , we compare the following two quantities:

Its relative frequency in $C_{m}$ , i.e, $|h\cap C_{m}|/m$ ;
Its probability $\Pr[h]$ .

We are interested in the difference, $D(h,C_{m}):={\big |}|h\cap C_{m}|/m-\Pr[h]{\big |}$ . This difference satisfies the following upper bound:

\Pr \left[\forall h\in H:D(h,C_{m})\leq {\sqrt {8(\ln \operatorname {Growth} (H,2m)+\ln(4/\delta )) \over m}}\right]~~~~>~~~~1-\delta

which is equivalent to:^[1]^: Th.2

\Pr {\big [}\forall h\in H:D(h,C_{m})\leq \varepsilon {\big ]}~~~~>~~~~1-4\cdot \operatorname {Growth} (H,2m)\cdot \exp(-\varepsilon ^{2}\cdot m/8)

In words: the probability that for all events in $H$ , the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of $H$ .

A corollary of this is that, iff the growth function is polynomial in $m$ (i.e, there exists some $n$ such that $\operatorname {Growth} (H,m)\leq m^{n}+1$ ), then the above probability approaches 1 as $m\to \infty$ . I.e, the family $H$ enjoys uniform convergence in probability.

References

^ ^a ^b ^c ^d ^e ^f ^g ^h Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9.
^ ^a ^b ^c ^d Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. US, Massachusetts: MIT Press. ISBN 9780262018258., especially Section 3.2
^ ^a ^b ^c ^d Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135.

[vc-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9.

[book12-2] Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. US, Massachusetts: MIT Press. ISBN 9780262018258., especially Section 3.2

[book14-3] Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135.

[1]

[2]

[3]