I changed the title because I think ite better fits the object

Link

edit approved Aug 17, 2015 at 11:27

user83346

Identifying subject specific outliers for within-patient numerical datain the presence of between subject heterogeneity

Tweeted twitter.com/#!/StackStats/status/506536803219013632

occurred Sep 1, 2014 at 20:19

added 675 characters in body

Source Link

edited Sep 1, 2014 at 7:28

rambles

293
1
8

I have a simple dataset of people's heights, many people with measurements on multiple days (once a year for 10 years, say). I have the date of each measurement.

Some of the height values are absurd. I already drop values that are 'impossible' (e.g. height values above 3m). However, I'd like to identify unusual values within a patient. If a patient has 5 records at around 1.8m, I'd like a red flag if there's a value for 2.1m

I am considering trying the ESD test for outliers - it seems straightforward to implement - but I thought I'd ask if there are better ideas before I get started.

Thanks.

EDIT:

The ESD test I refer to is the [[Generalized] 'Extreme Studentized Deviate'][1] test; similar to the Grubbs Test, it 'is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution'.

I'll have measurements for over 100,000 people - however, each person will only have between 0 and around 20 measurements.

I'm hoping to generalise this to other measurements such as BMI, lab tests (Eosinophils, White Blood Counts), respiratory tests (FEV1), etc. I suspect the latter ones will be difficult as they're more varied, rising and falling with little dependency on time - hence my starting with height!

I have a simple dataset of people's heights, many people with measurements on multiple days (once a year for 10 years, say). I have the date of each measurement.

Some of the height values are absurd. I already drop values that are 'impossible' (e.g. height values above 3m). However, I'd like to identify unusual values within a patient. If a patient has 5 records at around 1.8m, I'd like a red flag if there's a value for 2.1m

I am considering trying the ESD test for outliers - it seems straightforward to implement - but I thought I'd ask if there are better ideas before I get started.

Thanks.

I have a simple dataset of people's heights, many people with measurements on multiple days (once a year for 10 years, say). I have the date of each measurement.

Some of the height values are absurd. I already drop values that are 'impossible' (e.g. height values above 3m). However, I'd like to identify unusual values within a patient. If a patient has 5 records at around 1.8m, I'd like a red flag if there's a value for 2.1m

I am considering trying the ESD test for outliers - it seems straightforward to implement - but I thought I'd ask if there are better ideas before I get started.

Thanks.

EDIT:

The ESD test I refer to is the [[Generalized] 'Extreme Studentized Deviate'][1] test; similar to the Grubbs Test, it 'is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution'.

I'll have measurements for over 100,000 people - however, each person will only have between 0 and around 20 measurements.

I'm hoping to generalise this to other measurements such as BMI, lab tests (Eosinophils, White Blood Counts), respiratory tests (FEV1), etc. I suspect the latter ones will be difficult as they're more varied, rising and falling with little dependency on time - hence my starting with height!

Source Link

asked Aug 29, 2014 at 15:42

rambles

293
1
8

Identifying outliers for within-patient numerical data

I have a simple dataset of people's heights, many people with measurements on multiple days (once a year for 10 years, say). I have the date of each measurement.

Some of the height values are absurd. I already drop values that are 'impossible' (e.g. height values above 3m). However, I'd like to identify unusual values within a patient. If a patient has 5 records at around 1.8m, I'd like a red flag if there's a value for 2.1m

I am considering trying the ESD test for outliers - it seems straightforward to implement - but I thought I'd ask if there are better ideas before I get started.

Thanks.

repeated-measures outliers

Stack Exchange Network

Return to Question

Identifying subject specific outliers for within-patient numerical datain the presence of between subject heterogeneity

Identifying outliers for within-patient numerical data