Jump to content

Growth function

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Dscheinder (talk | contribs) at 10:21, 21 January 2020 (Examples). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The growth function, also called the shatter coefficient or the shattering number, measures the richness of a set family. It is especially used in the context of statistical learning theory, where it measures the complexity of a hypothesis class. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties.[1] It is a basic concept in machine learning.[2] [3]

Definitions

Set-family definition

Let be a set family (a set of sets) and a set. Their intersection is defined as the following set-family:

The intersection-size (also called the index) of with respect to is . If a set has elements then the index is at most . If the index is exactly 2m then the set is said to be shattered by , because contains all the subsets of , i.e.:

The growth function measures the size of as a function of . Formally:

Hypothesis-class definition

Equivalently, let be a hypothesis-class (a set of binary functions) and a set with elements. The restriction of to is the set of binary functions on that can be derived from :[3]: 45 

The growth function measures the size of as a function of :[3]: 49 

Examples

1. The domain is the real line . The set-family contains all the half-lines (rays) from a given number to positive infinity, i.e., all sets of the form for some . For any set of real numbers, the intersection contains sets: the empty set, the set containing the largest element of , the set containing the two largest elements of , and so on. Therefore: .[1]: Ex.1  The same is true whether contains open half-lines, closed half-lines, or both.

2. The domain is the segment . The set-family contains all the open sets. For any finite set of real numbers, the intersection contains all possible subsets of . There are such subsets, so . [1]: Ex.2 

3. The domain is the Euclidean space . The set-family contains all the half-spaces of the form: , where is a fixed vector. Then , where Comp is the number of number of components in a partitioning of an n-dimensional space by m hyperplanes.[1]: Ex.3 

4. The domain is the real line . The set-family contains all the real intervals, i.e., all sets of the form for some . For any set of real numbers, the intersection contains all runs of between 0 and consecutive elements of . The number of such runs is , so .

Polynomial or exponential

The main property that makes the growth function interesting is that it can be either polynomial or exponential - nothing in-between.

The following is a property of the intersection-size:[1]: Lem.1 

  • If, for some set of size , and for some number , -
  • then, there exists a subset of size such that .

This implies the following property of the Growth function.[1]: Th.1  For every family there are two cases:

  • The exponential case: identically.
  • The polynomial case: is majorized by , where is the smallest integer for which .

Other properties

Trivial upper bound

For any finite :

since for every , the number of elements in is at most . Therefore, the growth function is mainly interesting when is infinite.

Exponential upper bound

For any nonempty :

I.e, the growth function has an exponential upper-bound.

We say that a set-family shatters a set if their intersection contains all possible subsets of , i.e. . If shatters of size , then , which is the upper bound.

Cartesian intersection

Define the Cartesian intersection of two set-families as:

.

Then:[2]: 57 

Union

For every two set-families:[2]: 58 

VC dimension

The VC dimension of is defined according to these two cases:

  • In the polynomial case, = the largest integer for which .
  • In the exponential case .

So if-and-only-if .

The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether is equal to or smaller than , while the growth function tells us exactly how changes as a function of .

Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:[3]: 49 

If , then:
for all :

In particular,

for all :
so when the VC dimension is finite, the growth function grows polynomially with .

This upper bound is tight, i.e, for all there exists with VC dimension such that:[2]: 56 

Entropy

While the growth-function is related to the maximum intersection-size, the entropy is related to the average intersection size:[1]: 272–273 

The intersection-size has the following property. For every set-family :

Hence:

Moreover, the sequence converges to a constant when .

Moreover, the random-variable is concentrated near .

Applications in probability theory

Let be a set on which a probability measure is defined. Let be family of subsets of (= a family of events).

Suppose we choose a set that contains elements of , where each element is chosen at random according to the probability measure , independently of the others (i.e, with replacements). For each event , we compare the following two quantities:

  • Its relative frequency in , i.e, ;
  • Its probability .

We are interested in the difference, . This difference satisfies the following upper bound:

which is equivalent to:[1]: Th.2 

In words: the probability that for all events in , the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of .

A corollary of this is that, iff the growth function is polynomial in (i.e, there exists some such that ), then the above probability approaches 1 as . I.e, the family enjoys uniform convergence in probability.

References

  1. ^ a b c d e f g h Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9.
  2. ^ a b c d Mohri, Mehryar; Rostamizadeh, Afshin; Talwalkar, Ameet (2012). Foundations of Machine Learning. US, Massachusetts: MIT Press. ISBN 9780262018258., especially Section 3.2
  3. ^ a b c d Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135.