2
$\begingroup$

I am using Ordinal Semiparametric Regression (Frank Harrell's rms package) to model overall survival in patients with brain tumor. I am thinking of centering the Age covariate, because I want Age = 0 to represent the average person, not a newborn. But I am doubtful if I should standardize, because setting the variance to 1 will make the distances between ages change, thus making the interpretation of Age harder (since the units are no longer years). That is why centering only feels more natural to me, instead of standardizing. I am wondering if this is a valid concern.

Standardization:

df$Age_c <- as.numeric(scale(df$Age))

Centering only:

df$Age_c <- as.numeric(scale(df$Age, scale = FALSE))

I would really appreciate your guidance.

$\endgroup$

2 Answers 2

6
$\begingroup$

Any of the options can make sense.

Leaving age as is makes it easiest to interpret. Centering does, indeed, make the average age 0, but that's the average age for your data set. It's similar with standardizing: It makes the year variable act as the sd of year. There are arguments for doing this if you want to compare variables (this has been discussed here, no need to have those debates again) but the sd will be for your data.

I am a fan of leaving variables as they are. To me, that makes it easiest to interpret. "A 68 year old" is easier than "someone who is 3 years older than the mean" and "per year" is easier than "per sd of years, which is 2.35".

But arguments can be and have been made for other choices.

$\endgroup$
2
  • $\begingroup$ Thank you for your answer! $\endgroup$ Commented 20 hours ago
  • 1
    $\begingroup$ @ÇağanKaplan Note that as this kind of regression model is equivariant against linear transformation of the x-variables, solutions will be mathematically equivalent for any of these options. So interpretation incl. regarding variable comparison can be a concern, but "mathematically right or wrong" isn't. The answer implicitly uses this fact but doesn't say it explicitly, and it could be worthwhile to add that. $\endgroup$ Commented 13 hours ago
4
$\begingroup$

Peter’s answer is good in general. Just some specific points about centering v scaling v doing nothing.

  1. In a model with an intercept, centering variables can make the intercept and other contrasts in your model more interpretable, but there are many situations in which the interest is only in additive model terms which don't depend on centering. So again, doing nothing is just fine.
  2. Moreover, as Christian notes above, in many regression situations, mathematically, all of the centering / scaling options are equivalent, so the choice is really yours with respect to scaling predictors
  3. The scaling of predictors matters more in certain penalized / regularized regression and classification situations where you might want each coefficient on “equal footing” when it comes to shrinkage.
  4. One final place where centering does make a bigger difference is in multilevel models, where certain tests of coefficients (and hypotheses that go with them) depend on group-mean-centering predictors, especially if an aggregate version of the predictor is used at a different level.
$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.