Growth function
The growth function (also called: shatter coefficient or shattering number) measures the richness of a set family. It is especially used in the context of statistical learning theory, where it measures the complexity of a hypothesis class. The term 'growth function' was coined by Vapnik and Chervonenkis in their 1968 paper, where they also proved many of its properties.[1]
Definitions
Set-family definition
Let be a set family (a set of sets) and a set. Their intersection is defined as the following set-family:
The index of with respect to is the size of this intersection:
Obviously, if a set has elements then the index is at most .
The growth function measures the size of as a function of . Formally:
Hypothesis-class definition
Equivalently, let be a hypothesis-class (a set of binary functions) and a set with elements. The restriction of to is the set of binary functions on that can be derived from :[2]: 45
The growth function measures the size of as a function of :[2]: 49
Examples
1. The domain is the real line . The set-family contains all the half-lines (rays) from a given number to positive infinity, i.e., all sets of the form for some . For any set of real numbers, the intersection contains sets: the empty set, the set containing the largest element of , the set containing the two largest elements of , and so on. Therefore: .[1]: Ex.1 The same is true whether contains open half-lines, closed half-lines, or both.
2. The domain is the segment . The set-family contains all the open sets. For any set of real numbers, the intersection contains all possible subsets of . There are such subsets, so . [1]: Ex.2
3. The domain is the Euclidean space . The set-family contains all the half-spaces of the form: , where is a fixed vector. Then , where Comp is the number of number of components in a partitioning of an n-dimensional space by m hyperplanes.[1]: Ex.3
4. The domain is the real line . The set-family contains all the real intervals, i.e., all sets of the form for some . For any set of real numbers, the intersection contains all runs of between 0 and consecutive elements of . The number of such runs is , so .
Properties
The growth function has two trivial bounds.
1. For any finite :
since for every , the number of elements in is at most . Therefore, the growth function is mainly interesting when is infinite.
2. For any nonempty :
I.e, the growth function has an exponential upper-bound.
We say that a set-family shatters a set if their intersection contains all possible subsets of , i.e. . If shatters of size , then , which is the upper bound.
3. The following is a property of the Index function:
- If, for some set of size , and for some number , -
- then, there exists a subset of size such that = .
This implies the following property of the Growth function.[1]: Th.1 For every family there are two cases:
- The exponential case: identically.
- The polynomial case: is majorized by , where is the smallest integer for which .
The VC dimension of is defined according to these two cases:
- In the polynomial case, = the largest integer for which .
- In the exponential case .
So if-and-only-if .
The growth function can be regarded as a refinement of the concept of VC dimension. The VC dimension only tells us whether is equal to or smaller than , while the growth function tells us exactly how changes as a function of .
4. Another connection between the growth function and the VC dimension is given by the Sauer–Shelah lemma:[2]: 49
- If , then:
- for all :
In particular,
- for all :
- so when the VC dimension is finite, the growth function grows polynomially with .
Applications in probability theory
Let be a set on which a probability measure is defined. Let be family of subsets of (= a family of events).
Suppose we choose a set that contains elements of , where each element is chosen at random according to the probability measure , independently of the others (i.e, with replacements). For each event , we compare the following two quantities:
- Its relative frequency in , i.e, ;
- Its probability .
We are interested in the difference, . This difference satisfies the following upper bound:[1]: Th.2
In words: the probability that for all events in , the relative-frequency is near the probability, is lower-bounded by an expression that depends on the growth-function of .
A corollary of this is that, iff the growth function is polynomial in (i.e, there exists some such that ), then the above probability approaches 1 as . I.e, the family enjoys uniform convergence to probability.
References
- ^ a b c d e f Vapnik, V. N.; Chervonenkis, A. Ya. (1971). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Theory of Probability & Its Applications. 16 (2): 264. doi:10.1137/1116025. This is an English translation, by B. Seckler, of the Russian paper: "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Dokl. Akad. Nauk. 181 (4): 781. 1968. The translation was reproduced as: Vapnik, V. N.; Chervonenkis, A. Ya. (2015). "On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities". Measures of Complexity. p. 11. doi:10.1007/978-3-319-21852-6_3. ISBN 978-3-319-21851-9.
- ^ a b c Shalev-Shwartz, Shai; Ben-David, Shai (2014). Understanding Machine Learning – from Theory to Algorithms. Cambridge University Press. ISBN 9781107057135.