Return to Answer

added 57 characters in body

Source Link

edited Dec 20, 2021 at 20:55

3.1k
9
12

(+1) to BruceET. In Bruce's example the data generative process is clearly not normal. The performance of the two-sample t-test on the raw data might be improved by incorporating a log link function. Additionally, the score and likelihood ratio test would have improved performance compared to a t- or Wald test with an identity link. Here is a related link if your data is very clearly non-normal.

I generally lean towards not removing outliers or removing them as part of a sensitivity analysis if the outliers are a result of improper data collection. For a given sensitivity analysis I would set the outliers to missing and consider a missing data assumption (the simplest being an ignorable missing data mechanism). To pressure test the missing data assumption several sensitivity analyses can be performed, each under a different missing data assumption.

Source Link

answered Dec 20, 2021 at 20:44

Geoffrey Johnson

3.1k
9
12

I generally lean towards not removing outliers or removing them as part of a sensitivity analysis if the outliers are a result of improper data collection. For a given sensitivity analysis I would set the outliers to missing and consider a missing data assumption. To pressure test the missing data assumption several sensitivity analyses can be performed, each under a different missing data assumption.