Skip to main content
added 57 characters in body
Source Link

(+1) to BruceET. In Bruce's example the data generative process is clearly not normal. The performance of the two-sample t-test on the raw data might be improved by incorporating a log link function. Additionally, the score and likelihood ratio test would have improved performance compared to a t- or Wald test with an identity link. Here is a related link if your data is very clearly non-normal.

I generally lean towards not removing outliers or removing them as part of a sensitivity analysis if the outliers are a result of improper data collection. For a given sensitivity analysis I would set the outliers to missing and consider a missing data assumption (the simplest being an ignorable missing data mechanism). To pressure test the missing data assumption several sensitivity analyses can be performed, each under a different missing data assumption.

(+1) to BruceET. In Bruce's example the data generative process is clearly not normal. The performance of the two-sample t-test on the raw data might be improved by incorporating a log link function. Additionally, the score and likelihood ratio test would have improved performance compared to a t- or Wald test with an identity link. Here is a related link if your data is very clearly non-normal.

I generally lean towards not removing outliers or removing them as part of a sensitivity analysis if the outliers are a result of improper data collection. For a given sensitivity analysis I would set the outliers to missing and consider a missing data assumption. To pressure test the missing data assumption several sensitivity analyses can be performed, each under a different missing data assumption.

(+1) to BruceET. In Bruce's example the data generative process is clearly not normal. The performance of the two-sample t-test on the raw data might be improved by incorporating a log link function. Additionally, the score and likelihood ratio test would have improved performance compared to a t- or Wald test with an identity link. Here is a related link if your data is very clearly non-normal.

I generally lean towards not removing outliers or removing them as part of a sensitivity analysis if the outliers are a result of improper data collection. For a given sensitivity analysis I would set the outliers to missing and consider a missing data assumption (the simplest being an ignorable missing data mechanism). To pressure test the missing data assumption several sensitivity analyses can be performed, each under a different missing data assumption.

Source Link

(+1) to BruceET. In Bruce's example the data generative process is clearly not normal. The performance of the two-sample t-test on the raw data might be improved by incorporating a log link function. Additionally, the score and likelihood ratio test would have improved performance compared to a t- or Wald test with an identity link. Here is a related link if your data is very clearly non-normal.

I generally lean towards not removing outliers or removing them as part of a sensitivity analysis if the outliers are a result of improper data collection. For a given sensitivity analysis I would set the outliers to missing and consider a missing data assumption. To pressure test the missing data assumption several sensitivity analyses can be performed, each under a different missing data assumption.