Skip to main content
8 events
when toggle format what by license comment
20 hours ago comment added Peter Flom Why are you testing for outliers at all? Also "big data" naturally has outliers, Tukey's guideline was not only for one dimensional data, it was for small data sets. With large N, even perfectly normal data will have lots of outliers per Tukey.
21 hours ago answer added Christian Hennig timeline score: 5
21 hours ago history edited Nick Cox CC BY-SA 4.0
edited body
22 hours ago comment added Nick Cox As is often flagged here, and elsewhere, whether points lie beyond Q3 + 1.5 IQR or Q1 $-$ 1.5 IQR was at most Tukey's rule of thumb suggestion for which points should be plotted individually on a box plot in a first pass exploratory analysis. It was not, and is not, a test in any but a uselessly loose sense of the term test, and is dangerously crude if used as a single criterion for deciding which points are problematic, let alone to be excluded or omitted. (What you mean with your notation is clear only to those who recognise the allusion.)
22 hours ago comment added Christian Hennig All tests and methods mentioned by you are for one-dimensional data, and wouldn't take any time series structure into account. These can be used in various ways, depending on what your problem actually is: (1) Do you want to detect an outlier in a single time series? (2) Do you want to detect an outlying observation at a specific time point looking at various time series, or (3) do you want to detect a time series that is outlying from the others? (3) can't be done using IQR or the cited tests; for (1) these are questionable because the time order is informative.
23 hours ago history became hot network question
yesterday answer added Stephan Kolassa timeline score: 8
yesterday history asked Anke CC BY-SA 4.0