Jump to content

Truncated mean

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 131.169.224.214 (talk) at 09:36, 7 May 2015 (Examples). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both. This number of points to be discarded is usually given as a percentage of the total number of points, but may also be given as a fixed number of points.

For most statistical applications, 5 to 25 percent of the ends are discarded; the 25% trimmed mean (when the lowest 25% and the highest 25% are discarded) is known as the interquartile mean. For example, given a set of 8 points, trimming by 12.5% would discard the minimum and maximum value in the sample: the smallest and largest values, and would compute the mean of the remaining 6 points.

The median can be regarded as a fully truncated mean and is most robust. As with other trimmed estimators, the main advantage of the trimmed mean is robustness and higher efficiency for mixed distributions and heavy-tailed distribution (like the Cauchy distribution), at the cost of lower efficiency for some other less heavily-tailed distributions (such as the normal distribution). For intermediate distributions the differences between the efficiency of the mean and the median are not very big, e.g. for the student-t distribution with 2 degrees of freedom the variances for mean and median are nearly equal.

Terminology

In some regions of Central Europe it is also known as a Windsor mean, but this name should not be confused with the Winsorized mean: in the latter, the observations that the trimmed mean would discard are instead replaced by the largest/smallest of the remaining values.

Discarding only the maximum and minimum is known as the modified mean, particularly in management statistics.[1]

Interpolation

When the percentage of points to discard does not yield a whole number, the trimmed mean may be defined by interpolation, generally linear interpolation, between the nearest whole numbers. For example, if you need to calculate the 15% trimmed mean of a sample containing 10 entries, strictly this would mean discarding 1 point from each end (equivalent to the 10% trimmed mean). If interpolating, one would instead compute the 10% trimmed mean (discarding 1 point from each end) and the 20% trimmed mean (discarding 2 points from each end), and then interpolating, in this case averaging these two values. Similarly, if interpolating the 12% trimmed mean, one would take the weighted average: weight the 10% trimmed mean by 0.8 and the 20% trimmed mean by 0.2.

Advantages

The truncated mean is a useful estimator because it is less sensitive to outliers than the mean but will still give a reasonable estimate of central tendency or mean for many statistical models. In this regard it is referred to as a robust estimator.

One situation in which it can be advantageous to use a truncated mean is when estimating the location parameter of a Cauchy distribution, a bell shaped probability distribution with (much) fatter tails than a normal distribution. It can be shown that the truncated mean of the middle 24% sample order statistics (i.e., truncate the sample by 38%) produces an estimate for the population location parameter that is more efficient than using either the sample median or the full sample mean.[2][3] However, due to the fat tails of the Cauchy distribution, the efficiency of the estimator decreases as more of the sample gets used in the estimate.[2][3] Note that for the Cauchy distribution, neither the truncated mean, full sample mean or sample median represents a maximum likelihood estimator, nor are any as asymptotically efficient as the maximum likelihood estimator; however, the maximum likelihood estimate is more difficult to compute, leaving the truncated mean as a useful alternative.[3][4]

Drawbacks

The truncated mean uses more information from the distribution or sample than the median, but unless the underlying distribution is symmetric, the truncated mean of a sample is unlikely to produce an unbiased estimator for either the mean or the median.

Examples

The scoring method used in many sports that are evaluated by a panel of judges is a truncated mean: discard the lowest and the highest scores; calculate the mean value of the remaining scores.[5]

The Libor benchmark interest rate is calculated as a trimmed mean: given 18 response, the top 4 and bottom 4 are discarded, and the remaining 10 are averaged (yielding trim factor of 4/18 ≈ 22%).[6]


Consider the data set consisting of:

The 5th percentile (-6.75) lies between −40 and −5, while the 95th percentile (148.6) lies between 101 and 1053. (Values shown in bold.) Then a 90% trimmed meand would result in the following:

This example can be compared with the one using the Winsorising procedure.

Python can calculate the trimmed mean using NumPy and SciPy libraries :

import scipy.stats
import numpy as np
a = np.array([92, 19, 101, 58, 1053, 91, 26, 78, 10, 13, -40, 101, 86, 85, 15, 89, 89, 28, -5, 41])
min = np.percentile(a, 5, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
max = np.percentile(a, 95, axis=None, out=None, overwrite_input=False, interpolation='linear', keepdims=False)
scipy.stats.tmean(a, limits=(min,max), inclusive=(True, True))

See also

References

  1. ^ Arulmozhi, G.; Statistics For Management, 2nd Edition, Tata McGraw-Hill Education, 2009, p. 458
  2. ^ a b Rothenberg, Thomas J.; Fisher, Franklin, M.; Tilanus, C.B. (1964). "A note on estimation from a cauchy sample". Journal of the American Statistical Association. 59 (306): 460–463. doi:10.1080/01621459.1964.10482170.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  3. ^ a b c Bloch, Daniel (1966). "A note on the estimation of the location parameters of the Cauchy distribution". Journal of the American Statistical Association. 61 (316): 852–855. doi:10.1080/01621459.1966.10480912. JSTOR 2282794.
  4. ^ Ferguson, Thomas S. (1978). "Maximum Likelihood Estimates of the Parameters of the Cauchy Distribution for Samples of Size 3 and 4". Journal of the American Statistical Association. 73 (361): 211. doi:10.1080/01621459.1978.10480031. JSTOR 2286549.
  5. ^ Bialik, Carl (27 July 2012). "Removing Judges' Bias Is Olympic-Size Challenge". The Wall Street Journal. Retrieved 7 September 2014.
  6. ^ "bbalibor: The Basics". The British Bankers' Association.