Outlier


 
 

:This article deals with outliers in statistics. For Polynesian outliers, see the article Polynesian outliers

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

In statistics, an outlier is a single observation far away from the rest of the data.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

One definition of "far away" in this context is:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

:less than Q1 − (1.5 × IQR) or greater than Q3 + (1.5 × IQR)

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

where Q1 and Q3 are the first and third quartiles, respectively, and IQR is the interquartile range (equal to Q3 − Q1).

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

These values define the so-called inner fences, beyond which an observation would be labeled a mild outlier.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Extreme outliers are observations that are beyond the outer fences:

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

:less than Q1 − (3 × IQR) or greater than Q3 + (3 × IQR)

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

In the case of normally distributed data, using the above definitions, only about 1 in 150 observations will be a mild outlier and only about 1 in 425,000 an extreme outlier. Because of this, outliers usually demand special attention since they may indicate problems in sampling or data collection or transcription. Alternatively, an outlier could be the result of, for example, a truly unusual response to a given treatment, calling for further investigation by the researcher.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

Even when a normal model is appropriate to the data being analyzed, outliers are expected for large sample sizes and should not automatically be discarded if that is the case. Also, the possibility should be considered that the underlying distribution of the data is not approximately normal, having fat tails. For instance, when sampling from a Cauchy distribution, the sample variance increases with the sample size, the sample mean fails to converge as the sample size increases, and outliers are expected at far larger rates than for a normal distribution.

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

See also: box plot, Studentized residual, Chauvenet's criterion

~ ~ ~ ~ ~ ~ ~ ~ ~ ~

~ ~ ~ ~ ~ ~ ~ ~ ~ ~


 

Statistics: Statistics is a type of data analysis which practice includes the planning, summarizing, and interpreting of observations of a system possibly followed by predicting or forecasting of future events based on a mathematical model of the system being observed. Statistics is a branch of applied mathema...

Data: Data is the plural of datum....

Quartile: In descriptive statistics, a quartile is any of the three values which divide the sorted data set into four equal parts, so that each part represents 1/4th of the sample or population....

~ Table of Content ~

Introduction
 


 

~ Related Subjects ~

System (1) - Forecasting (1) - Mathematical model (1) - Data analysis (1) - Planning (1) - Observations (1) - Applied mathematics (1) - Datum (1) - Descriptive statistics (1) - Data set (1) - Statistical theory (1) - Probability theory (1) - Plural (1) - Interquartile range (1) - Normally distributed (1) -
 

~ Community ~

History Forum
Come and discuss about History, Civilizations, Historical Events and Figures
History Web-Ring
A community of sites, blogs and forums dedicated to History. Do not hesitate to submit your site.