Questions tagged [regularization]
Inclusion of additional constraints (typically a penalty for complexity) in the model fitting process. Used to prevent overfitting / enhance predictive accuracy.
1,458 questions
1
vote
2
answers
80
views
Are there superefficient statistics that shrink toward the true parameter value, in probability?
Stein (1964) defined a superefficient estimator for the univariate normal variance with population mean unknown. (It's unrelated to James-Stein; see here for a summary.) He gave an indicator variable ...
0
votes
1
answer
54
views
Ridge regression - derivation of model coefficients question
I would like to understand the derivation of penalized intercept formula for Ridge regression in the Lecture Notes below (see image).
The variables $X_{ij}$ and the outcome, $y_i$, are centered by ...
0
votes
0
answers
32
views
Is there a formula for the degrees of freedom for the group elastic net?
I'm interested in calculating AIC/BIC for group elastic net models. I've found formulas for the degrees of freedom for the the elastic net (Degrees of Freedom in Lasso Problems) and for the group ...
2
votes
1
answer
129
views
A question about minimizing $l_2$ norm with regularization
PREMISES: this question likely arises from my very basic knowledge of the field. Please, be very detailed in the answer, even it can seem that some facts are trivial. Also, sorry for my poor english.
...
0
votes
0
answers
44
views
Understanding the pseudocode of SMART regularization
I want to implement SMART regularization from this paper for my project on finetuning a language model for paraphrase identification. I understand the overall general idea of the algorithm and its two ...
5
votes
1
answer
349
views
Does regularizing all parameters have any effect?
In Andrew Ng's Machine Learning course (Module 7 on Regularization), he mentions that if we're unsure which parameters to regularize, it's reasonable to include all parameters in the regularization ...
5
votes
1
answer
215
views
Approaches for modelling survival time in groups with very low risk
I'm currently working on a study with a group of hematologists. Patients with PV (Polycythemia vera) have very low risk of future thromboembolism after diagnosis due to disease management. Patients ...
1
vote
1
answer
103
views
Why does bias make inference difficult?
Let us consider linear regression as a concrete case. If the residuals have nonzero mean then the OLS solution is biased, $E[\hat{\beta}] \neq \beta$. Another way bias can appear is if you use ...
0
votes
0
answers
42
views
Bounding the error when using L1 regularization
I'm reading some lecture notes on High-Dimensional Statistics (https://arxiv.org/abs/2310.19244) and on page 59 I'm not able to follow the proof.
The setup is this: we assume that data in the form of ...
1
vote
0
answers
84
views
Papers on theory behind LASSO (and regularization in general)
I am looking for some papers that look at LASSO and regularization in general from a theoretical perspective. For example, I am looking for papers which prove that, under such and such assumptions, ...
2
votes
0
answers
73
views
Variable selection with clustered data on large datasets (700k): Cross-validated, scalable, and interpretable models with random effects?
I’m working with a large dataset (≈700k observations) from an experiment, involving ≈5k patients and repeated trials across ≈50 covariates. The data structure includes multiple levels of clustering, ...
0
votes
0
answers
65
views
feature weights vs leaf weights in xgboost (L1 and L2 regularization)
This article mentions "feature weights" several times: https://xgboosting.com/xgboost-regularization-techniques/
However, it's not clear to me how a tree can have feature weights? It's not ...
0
votes
0
answers
59
views
Under what conditions is penalised regression appropriate?
In this question posted earlier this year I asked about strange results from a penalised regression regression model: stacked elastic net regression in fact. The CV member who answered my question ...
0
votes
0
answers
43
views
Does Batch Normalization act as a regularizer when we don't shuffle the dataset at each epoch?
Batch Normalization (BN) is a technique to accelerate the convergence when training neural networks. It is also assumed to act as a regularizer, since the the mean and standard deviation are ...
0
votes
0
answers
50
views
Penalized regression with an intercept and a sparse design matrix
It is usually desired to exclude the intercept term from penalization when running lasso regression. In ESL and related books, it is recommended to center the response $y\in\mathbb{R}^n$ and design ...