Skip to main content
I changed the title because I think ite better fits the object
Link

Identifying subject specific outliers for within-patient numerical datain the presence of between subject heterogeneity

Tweeted twitter.com/#!/StackStats/status/506536803219013632
added 675 characters in body
Source Link
rambles
  • 293
  • 1
  • 8

I have a simple dataset of people's heights, many people with measurements on multiple days (once a year for 10 years, say). I have the date of each measurement.

Some of the height values are absurd. I already drop values that are 'impossible' (e.g. height values above 3m). However, I'd like to identify unusual values within a patient. If a patient has 5 records at around 1.8m, I'd like a red flag if there's a value for 2.1m

I am considering trying the ESD test for outliers - it seems straightforward to implement - but I thought I'd ask if there are better ideas before I get started.

Thanks.

EDIT:

The ESD test I refer to is the [[Generalized] 'Extreme Studentized Deviate'][1] test; similar to the Grubbs Test, it 'is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution'.

I'll have measurements for over 100,000 people - however, each person will only have between 0 and around 20 measurements.

I'm hoping to generalise this to other measurements such as BMI, lab tests (Eosinophils, White Blood Counts), respiratory tests (FEV1), etc. I suspect the latter ones will be difficult as they're more varied, rising and falling with little dependency on time - hence my starting with height!

I have a simple dataset of people's heights, many people with measurements on multiple days (once a year for 10 years, say). I have the date of each measurement.

Some of the height values are absurd. I already drop values that are 'impossible' (e.g. height values above 3m). However, I'd like to identify unusual values within a patient. If a patient has 5 records at around 1.8m, I'd like a red flag if there's a value for 2.1m

I am considering trying the ESD test for outliers - it seems straightforward to implement - but I thought I'd ask if there are better ideas before I get started.

Thanks.

I have a simple dataset of people's heights, many people with measurements on multiple days (once a year for 10 years, say). I have the date of each measurement.

Some of the height values are absurd. I already drop values that are 'impossible' (e.g. height values above 3m). However, I'd like to identify unusual values within a patient. If a patient has 5 records at around 1.8m, I'd like a red flag if there's a value for 2.1m

I am considering trying the ESD test for outliers - it seems straightforward to implement - but I thought I'd ask if there are better ideas before I get started.

Thanks.

EDIT:

The ESD test I refer to is the [[Generalized] 'Extreme Studentized Deviate'][1] test; similar to the Grubbs Test, it 'is used to detect one or more outliers in a univariate data set that follows an approximately normal distribution'.

I'll have measurements for over 100,000 people - however, each person will only have between 0 and around 20 measurements.

I'm hoping to generalise this to other measurements such as BMI, lab tests (Eosinophils, White Blood Counts), respiratory tests (FEV1), etc. I suspect the latter ones will be difficult as they're more varied, rising and falling with little dependency on time - hence my starting with height!

Source Link
rambles
  • 293
  • 1
  • 8

Identifying outliers for within-patient numerical data

I have a simple dataset of people's heights, many people with measurements on multiple days (once a year for 10 years, say). I have the date of each measurement.

Some of the height values are absurd. I already drop values that are 'impossible' (e.g. height values above 3m). However, I'd like to identify unusual values within a patient. If a patient has 5 records at around 1.8m, I'd like a red flag if there's a value for 2.1m

I am considering trying the ESD test for outliers - it seems straightforward to implement - but I thought I'd ask if there are better ideas before I get started.

Thanks.